useR!2017: Full Schedule

11:00am CEST

Transformation Forests

Keywords: random forest, transformation model, quantile regression forest, conditional distribution, conditional quantiles
Webpages: https://R-forge.R-project.org/projects/ctm https://arxiv.org/1701.02110
Regression models for supervised learning problems with a continuous target are commonly understood as models for the conditional mean of the target given predictors. This notion is simple and therefore appealing for interpretation and visualisation. Information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding of regression models as models for conditional distributions allows much broader inference from such models, for example the computation of prediction intervals. Several random forest-type algorithms aim at estimating conditional distributions, most prominently quantile regression forests. We propose a novel approach based on a parametric family of distributions characterised by their transformation function. A dedicated novel transformation tree'' algorithm able to detect distributional changes is developed. Based on these transformation trees, we introducetransformation forests’‘as an adaptive local likelihood estimator of conditional distribution functions. The resulting models are fully parametric yet very general and allow broad inference procedures, such as the model-based bootstrap, to be applied in a straightforward way. The procedures are implemented in the ``trtf’’ R add-on package currently available from R-forge.

Speakers

Torsten Hothorn

tf useR2017 pdf

Wednesday July 5, 2017 11:00am - 11:18am CEST
2.02 Wild Gallery

Talk, Machine Learning I

Company 657

11:18am CEST

A Benchmark of Open Source Tools for Machine Learning from R

Keywords: machine learning, predictive modeling, predictive accuracy, scalability, speed
Webpages: https://github.com/szilard/benchm-ml
Binary classification is one of the most widely used machine learning methods in business applications. If the number of features is not very large (sparse), algorithms such as random forests, gradient boosted trees or deep learning neural networks (and ensembles of those) are expected to perform the best in terms of accuracy. There are countless off-the-shelf open source implementations for the previous algorithms (e.g. R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.), but which one to use in practice? Surprisingly, there is a huge variation between even the most commonly used implementations of the same algorithm in terms of scalability, speed, accuracy. In this talk we will see which open source tools work reasonably well on larger datasets commonly encountered in practice. Not surprizingly, all the best tools are available seamlessly from R.

Speakers

Szilard Pafka

15 benchm ML useR 2017 (1) pdf

Wednesday July 5, 2017 11:18am - 11:36am CEST
2.02 Wild Gallery

Talk, Machine Learning I

Company 205

11:36am CEST

Distributional Trees and Forests

Keywords: Distributional regression, recursive partitioning, decision trees, random forests
Webpages: https://R-Forge.R-project.org/projects/partykit/
In regression analysis one is interested in the relationship between a dependent variable and one or more explanatory variables. Various methods to fit statistical models to the data set have been developed, starting from ordinary linear models considering only the mean of the response variable and ranging to probabilistic models where all parameters of a distribution are fit to the given data set.
If there is a strong variation within the data it might be advantageous to split the data first into more homogeneous subgroups based on given covariates and then fit a local model in each subgroup rather than fitting one global model to the whole data set. This can be done by applying regression trees and forests.
Both of these two concepts, parametric modeling and algorithmic trees, have been investigated and developed further, however, mostly separated from each other. Therefore, our goal is to embed the progress made in the field of probabilistic modeling in the idea of algorithmic tree and forest models. In particular, more flexible models such as GAMLSS (Rigby and Stasinopoulos 2005) should be fitted in the nodes of a tree in order to capture location, scale, shape as well as censoring, tail behavior etc. while non-additive effects of the explanatory variables can be detected by the splitting algorithm used to build the tree.
The corresponding implementation is provided in an R package disttree which is available on R-Forge and includes the two main functions disttree and distforest. Next to the data set and a formula the user only has to specify a distribution family and receives a tree/forest model with a set of distribution parameters for each final node. One possible way to specify a distribution family is to hand over a gamlss.dist family object (Stasinopoulos, Rigby, and others 2007). In disttree and distforest the fitting function distfit is applied within a tree building algorithm chosen by the user. Either the MOB algorithm, an algorithm for model-based recursive partitioning (Zeileis, Hothorn, and Hornik 2008), or the ctree algorithm (Hothorn, Hornik, and Zeileis 2006) can be used as a framework. These algorithms are both implemented in the partykit package (Hothorn et al. 2015).
References Hothorn, Torsten, Kurt Hornik, and Achim Zeileis. 2006. “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics 15 (3). Taylor & Francis: 651–74.

Hothorn, Torsten, Kurt Hornik, Carolin Strobl, and Achim Zeileis. 2015. “Package ’Party’.” Package Reference Manual for Party Version 0.9–0.998 16: 37.

Rigby, Robert A, and D Mikis Stasinopoulos. 2005. “Generalized Additive Models for Location Scale and Shape (with Discussion).” Applied Statistics 54.3: 507–54.

Stasinopoulos, D Mikis, Robert A Rigby, and others. 2007. “Generalized Additive Models for Location Scale and Shape (GAMLSS) in R.” Journal of Statistical Software 23 (7): 1–46.

Zeileis, Achim, Torsten Hothorn, and Kurt Hornik. 2008. “Model-Based Recursive Partitioning.” Journal of Computational and Graphical Statistics 17 (2). Taylor & Francis: 492–514.

Speakers

Lisa Schlosser

slides169 Schlosser Lisa pdf

Wednesday July 5, 2017 11:36am - 11:54am CEST
2.02 Wild Gallery

Talk, Machine Learning I

Company 464

11:54am CEST

mlrHyperopt: Effortless and collaborative hyperparameter optimization experiments

Keywords: machine learning, hyperparameter optimization, tuning, classification, networked science
Webpages: https://jakob-r.github.io/mlrHyperopt/
Most machine learning tasks demand hyperparameter tuning to achieve a good performance. For example, Support Vector Machines with radial basis functions are very sensitive to the choice of both kernel width and soft margin penalty C. However, for a wide range of machine learning algorithms these “search spaces” are less known. Even worse, experts for the particular methods might have conflicting views. The popular package caret (Jed Wing et al. 2016) approaches this problem by providing two simple optimizers grid search and random search and individual search spaces for all implemented methods. To prevent training on misconfigured methods a grid search is performed by default. Unfortunately it is only documented which parameters will be tuned but the exact bounds have to be obtained from the source code. As a counterpart mlr (Bischl et al. 2016) offers more flexible parameter tuning methods such as an interface to mlrMBO (Bischl et al. 2017) for conducting Bayesian optimization. Unfortunately mlr lacks of default search spaces and thus parameter tuning becomes difficult. Here mlrHyperopt steps in to make hyperparameter optimization as easy as in caret. As a matter of fact, for a developer of a machine learning package, it is unquestionable impossible to be an expert of all implemented methods and provide perfect search spaces. Hence mlrHyperopt aims at:

improving the search spaces of caret with simple tricks.
letting the users submit and download improved search spaces to a database.
providing advanced tuning methods interfacing mlr and mlrMBO.

A study on selected data sets and numerous popular machine learning methods compares the performance of the grid and random search implemented in caret to the performance of mlrHyperopt for different budgets.
References Bischl, Bernd, Michel Lang, Lars Kotthoff, Julia Schiffner, Jakob Richter, Erich Studerus, Giuseppe Casalicchio, and Zachary M. Jones. 2016. “Mlr: Machine Learning in R.” Journal of Machine Learning Research 17 (170): 1–5. https://CRAN.R-project.org/package=mlr.

Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas, and Michel Lang. 2017. “mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions.” arXiv:1703.03373 [Stat], March. http://arxiv.org/abs/1703.03373.

Jed Wing, Max Kuhn. Contributions from, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, et al. 2016. Caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret.

Speakers

Jakob Richter

jakob richter mlrHyperopt pdf

Wednesday July 5, 2017 11:54am - 12:12pm CEST
2.02 Wild Gallery

Talk, Machine Learning I

Company 1068

12:12pm CEST

The Revised Sequential Parameter Optimization Toolbox

Keywords: optimization, tuning, surrogate model, computer experiments
Webpages: https://CRAN.R-project.org/package=SPOT
Real-world optimization problems often have very high complexity, due to multi-modality, constraints, noise or other crucial problem features. For solving these optimization problems a large collection of methods are available. Most of these methods require to set a number of parameters, which have a significant impact on the optimization performance. Hence, a lot of experience and knowledge about the problem is necessary to give the best possible results. This situation grows worse if the optimization algorithm faces the additional difficulty of strong restrictions on resources, especially time, money or number of experiments.
Sequential parameter optimization (Bartz-Beielstein, Lasarczyk, and Preuss 2005) is a heuristic combining classical and modern statistical techniques for the purpose of efficient optimization. It can be applied in two manners:

to efficiently tune and select the parameters of other search algorithms, or
to optimize expensive-to-evaluate problems directly, via shifting the load of evaluations to a surrogate model.

SPO is especially useful in scenarios where

no experience of how to choose the parameter setting of an algorithm is available,
a comparison with other algorithms is needed,
an optimization algorithm has to be applied effectively and efficiently to a complex real-world optimization problem, and
the objective function is a black-box and expensive to evaluate.

The Sequential Parameter Optimization Toolbox SPOT provides enhanced statistical techniques such as design and analysis of computer experiments, different methods for surrogate modeling and optimization to effectively use sequential parameter optimization in the above mentioned scenarios.
Version 2 of the SPOT package is a complete redesign and rewrite of the original R package. Most function interfaces were redesigned to give a more streamlined usage experience. At the same time, modular and transparent code structures allow for increased extensibility. In addition, some new developments were added to the SPOT package. A Kriging model implementation, based on earlier Matlab code by Forrester et al. (Forrester, Sobester, and Keane 2008), has been extended to allow for the usage of categorical inputs. Additionally, it is now possible to use stacking for the construction of ensemble learners (Bartz-Beielstein and Zaefferer 2017). This allows for the creation of models with a far higher predictive performance, by combining the strengths of different modeling approaches.
In this presentation we show how the new interface of SPOT can be used to efficiently optimize the geometry of an industrial dust filter (cyclone). Based on a simplified simulation of this real world industry problem, some of the core features of SPOT are demonstrated.
References Bartz-Beielstein, Thomas, and Martin Zaefferer. 2017. “Model-Based Methods for Continuous and Discrete Global Optimization.” Applied Soft Computing 55: 154–67. doi:10.1016/j.asoc.2017.01.039.

Bartz-Beielstein, Thomas, Christian Lasarczyk, and Mike Preuss. 2005. “Sequential Parameter Optimization.” In Proceedings Congress on Evolutionary Computation 2005 (Cec’05), 1553. Edinburgh, Scotland. http://www.spotseven.de/wp-content/papercite-data/pdf/blp05.pdf.

Forrester, Alexander, Andras Sobester, and Andy Keane. 2008. Engineering Design via Surrogate Modelling. Wiley.

Speakers

Sebastian Krey

Krey et al pdf

Wednesday July 5, 2017 12:12pm - 12:30pm CEST
2.02 Wild Gallery

Talk, Machine Learning I

Company 1029

useR!2017

11:00am CEST

Torsten Hothorn

11:18am CEST

Szilard Pafka

11:36am CEST

Lisa Schlosser

11:54am CEST

Jakob Richter

12:12pm CEST

Sebastian Krey

Recently Active Attendees