useR!2017 has ended
Back To Schedule
Thursday, July 6 • 5:40pm - 5:45pm
Automatic Machine Learning in R

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Keywords: Machine Learning, Ensemble Learning, Automatic Machine Learning, Black-Box Optimization, Distributed Computing
Webpages: https://CRAN.R-project.org/package=h2o, https://gitbub.com/h2oai/h2o-3
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. The first steps toward simplifying machine learning in R focused on developing simple, unified interfaces to a variety of machine learning algorithms. This effort also involved providing a robust toolkit of utility functions that perform common tasks in machine learning such as random data partitioning, cross-validation and model evaluation. Successful examples of this simplification effort include the caret, mlr and h2o R packages.
Although these tools have made it easier for non-experts to experiment with machine learning, there is still a fair bit of knowledge and background in data science that is required to produce high-performing, production-ready/research-grade machine learning models. Deep Neural Networks in particular (which have become wildly popular in the past five years) are notoriously difficult for a non-expert to tune properly. In order for machine learning software to truly be accessible to non-experts, such systems must be able to automatically perform proper data pre-processing steps and return a highly optimized machine learning model.
H2O.ai has developed a distributed Automatic Machine Learning system called H2O AutoML (to be officially released in the h2o R package (H2O.ai 2017) approx. May-June 2017; currently in pre-release), which will be the first open source Automatic Machine Learning system available in R. In this presentation, we will present our methodology for automating the machine learning workflow, which includes feature pre-processing and automatic training and tuning of many models within a user-specified time-limit. The user can also specify which model performance metric that they’d like to optimize and use a metric-based stopping criterion for the AutoML process rather than a specific time constraint. By default, stacked ensembles will automatically trained on subset of the individual models to produce a highly predictive ensemble model, although this can be turned off if the user prefers to return singleton models only.
The interface is designed to have as few parameters as possible so that all the user needs to do is point to their dataset, identify the response column and optionally specify a time-constraint. Below is an example of how to specify an AutoML run for the default run-time.
aml <- h2o.automl(training_frame = train, response_column = "class") The AutoML object includes a history of all the data-processing and modeling steps that were taken, and will return a “leaderboard” of all the models that were trained in the process, ranked by a user’s model performance metric of choice.
References H2O.ai. 2017. H2O R Package. https://github.com/h2oai/h2o-3/tree/master/h2o-r.


Thursday July 6, 2017 5:40pm - 5:45pm CEST
4.02 Wild Gallery