useR!2017 has ended
Back To Schedule
Wednesday, July 5 • 11:36am - 11:54am
Distributional Trees and Forests

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Keywords: Distributional regression, recursive partitioning, decision trees, random forests
Webpages: https://R-Forge.R-project.org/projects/partykit/
In regression analysis one is interested in the relationship between a dependent variable and one or more explanatory variables. Various methods to fit statistical models to the data set have been developed, starting from ordinary linear models considering only the mean of the response variable and ranging to probabilistic models where all parameters of a distribution are fit to the given data set.
If there is a strong variation within the data it might be advantageous to split the data first into more homogeneous subgroups based on given covariates and then fit a local model in each subgroup rather than fitting one global model to the whole data set. This can be done by applying regression trees and forests.
Both of these two concepts, parametric modeling and algorithmic trees, have been investigated and developed further, however, mostly separated from each other. Therefore, our goal is to embed the progress made in the field of probabilistic modeling in the idea of algorithmic tree and forest models. In particular, more flexible models such as GAMLSS (Rigby and Stasinopoulos 2005) should be fitted in the nodes of a tree in order to capture location, scale, shape as well as censoring, tail behavior etc. while non-additive effects of the explanatory variables can be detected by the splitting algorithm used to build the tree.
The corresponding implementation is provided in an R package disttree which is available on R-Forge and includes the two main functions disttree and distforest. Next to the data set and a formula the user only has to specify a distribution family and receives a tree/forest model with a set of distribution parameters for each final node. One possible way to specify a distribution family is to hand over a gamlss.dist family object (Stasinopoulos, Rigby, and others 2007). In disttree and distforest the fitting function distfit is applied within a tree building algorithm chosen by the user. Either the MOB algorithm, an algorithm for model-based recursive partitioning (Zeileis, Hothorn, and Hornik 2008), or the ctree algorithm (Hothorn, Hornik, and Zeileis 2006) can be used as a framework. These algorithms are both implemented in the partykit package (Hothorn et al. 2015).
References Hothorn, Torsten, Kurt Hornik, and Achim Zeileis. 2006. “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics 15 (3). Taylor & Francis: 651–74.

Hothorn, Torsten, Kurt Hornik, Carolin Strobl, and Achim Zeileis. 2015. “Package ’Party’.” Package Reference Manual for Party Version 0.9–0.998 16: 37.

Rigby, Robert A, and D Mikis Stasinopoulos. 2005. “Generalized Additive Models for Location Scale and Shape (with Discussion).” Applied Statistics 54.3: 507–54.

Stasinopoulos, D Mikis, Robert A Rigby, and others. 2007. “Generalized Additive Models for Location Scale and Shape (GAMLSS) in R.” Journal of Statistical Software 23 (7): 1–46.

Zeileis, Achim, Torsten Hothorn, and Kurt Hornik. 2008. “Model-Based Recursive Partitioning.” Journal of Computational and Graphical Statistics 17 (2). Taylor & Francis: 492–514.


Wednesday July 5, 2017 11:36am - 11:54am CEST
2.02 Wild Gallery