Decision Trees

Decision Trees

Note for cohort state transition decision tree models, whereby input parameter values are stated to compute expected values, it is worth first referring to the Markov models section where the code and packages described can be used and simpified for the decision tree case, as stated in a heemod package vignette: https://cran.r-project.org/web/packages/heemod/vignettes/a_introduction.html

Many of the packages detailed below are focused on creating data tree structures.

Package: data.tree

Maintained by

Christoph Glur (christoph.glur@ipub.com) (https://cran.r-project.org/web/packages/data.tree/vignettes/data.tree.html#introduction and https://cran.r-project.org/web/packages/data.tree/data.tree.pdf )

What does this package do?

This package is used to create hierarchical structures through data.tree structures.

How do I input my data to it/what inputs does it take?

This package accepts inputs in a range of formats. Inputs can be from other tree packages (e.g. dendrogram, phylo from ape package and igraph trees). Additionally, data.frames can be used or the trees can be built programatically within the data.tree package. Data.tree can also use data saved in csv, yaml, json formats.

What outputs do I get?

The package produces a tree that you can navigate (‘climb’) and manipulate (e.g. ‘prune’ or ‘filter’) within R. data.tree can plot trees within its own package but can also use other packages to allow for styling of the tree. Other packages might be igraph, treeview, ape. Data from the data.tree can be saved as a data.tree type or as a data.frame and exported as csv, yaml, json

Sample code

library(treemap)
data(GNI2014)
treemap(GNI2014,
       index=c("continent", "iso3"),
       vSize="population",
       vColor="GNI",
       type="value") 

library(data.tree)
GNI2014$continent <- as.character(GNI2014$continent)
GNI2014$pathString <- paste("world", GNI2014$continent, GNI2014$country, sep = "/")
tree <- as.Node(GNI2014[,])
print(tree, pruneMethod = "dist", limit = 20)

##                        levelName
## 1  world                        
## 2   ¦--North America            
## 3   ¦   ¦--Bermuda              
## 4   ¦   ¦--United States        
## 5   ¦   °--... 22 nodes w/ 0 sub
## 6   ¦--Europe                   
## 7   ¦   ¦--Norway               
## 8   ¦   ¦--Switzerland          
## 9   ¦   °--... 39 nodes w/ 0 sub
## 10  ¦--Asia                     
## 11  ¦   ¦--Qatar                
## 12  ¦   ¦--Macao SAR, China     
## 13  ¦   °--... 45 nodes w/ 0 sub
## 14  ¦--Oceania                  
## 15  ¦   ¦--Australia            
## 16  ¦   ¦--New Zealand          
## 17  ¦   °--... 11 nodes w/ 0 sub
## 18  ¦--South America            
## 19  ¦   ¦--Uruguay              
## 20  ¦   ¦--Chile                
## 21  ¦   °--... 10 nodes w/ 0 sub
## 22  ¦--Seven seas (open ocean)  
## 23  ¦   ¦--Seychelles           
## 24  ¦   ¦--Mauritius            
## 25  ¦   °--... 1 nodes w/ 0 sub 
## 26  °--Africa                   
## 27      °--... 48 nodes w/ 0 sub

(code example is from Christopher Glur’s https://cran.r-project.org/web/packages/data.tree/vignettes/applications.html#jenny-lind-decision-tree-plotting)

Other helpful resources

This site has a range of examples of potential uses for the data.tree package ranging from financial applications to a genetic mutation algorithm (https://cran.r-project.org/web/packages/data.tree/vignettes/applications.html)

Package: party

Maintained by

Torsten Hothorn (Torsten.Hothorn@R-project.org)

(https://cran.r-project.org/web/packages/party/index.html)

What does this package do?

The core of the Party package is ctree(). Ctree() can be employed to build conditional inference trees. These trees use conditional inference procedures within their tree-structured regression models. Trees can be produced for nominal, ordinal, numeric and censored data, as well as multivariate response variables, and covariates with arbitrary measurement scales.

There is also a function called cforest(), which provides an implementation of Breiman’s random forests, based on conditional inference trees. In addition, the function mob() uses parametric models (e.g. linear models, GLMs or survival regression) with parameter instability tests for split selection. Finally, there are useful functions for visualizing trees. To learn more about what the function can do, check out the two comprehensive vignettes.

How do I input my data to it/what inputs does it take?

The input for mob is a dataframe.

What outputs do I get?

The output of mob is an object of class mob.

Sample code

Other packages

(not discussed in full) Party was updated in 2015 to the package Partykit, this includes a reimplementation of ctree() and an extended reimplementation of mob(), as well as a variety of other features (https://cran.r-project.org/web/packages/partykit/index.html and http://partykit.r-forge.r-project.org/partykit/).

Other helpful resources

Find out more about party at http://party.r-forge.r-project.org/.
There’s also a variety of online support available, for example: - https://rpubs.com/njvijay/14899 - https://www.tutorialspoint.com/r/r_decision_tree.htm - http://www.sthda.com/english/articles/35-statistical-machine-learning-essentials/141-cart-model-decision-tree-essentials/#loading-required-r-packages

Package: rpart

Maintained by

Beth Atkinson (atkinson@mayo.edu)

(https://cran.r-project.org/web/packages/rpart/index.html))

What does this package do?

This package performs recursive partitioning for classification, regression and survival trees. It implements the functionality in the 1984 book by Breiman, Friedman, Olshen and Stone, Classification and Regression Trees. The package also includes several example sets of data with a long, but clear, vignette.

Rpart or party?

Rpart produces classification and regression trees with cross-validation-based cost-complexity pruning. By contrast, Party produces conditional inference trees based on unbiased recursive partitioning achieved via permutation tests. As a result, RPart is sometimes seen as easier to explain and use for beginners, but Party appears to give greater accuracy and performance.

How do I input my data to it/what inputs does it take?

The input for rpart function in the package is a dataframe or formula.

What outputs do I get?

The rpart function outputs an rpart.object

Sample code

Other helpful resources

As this is a popular package there is lots of online support, for example:

Other packages to explore but not detailed here

Make decision trees or networks

Improving presentation of hierarchical clustering, and classification and regression trees

Convert JSON objects to R

Validate JSON objects