useR!2017 has ended
Back To Schedule
Wednesday, July 5 • 11:18am - 11:36am
A Benchmark of Open Source Tools for Machine Learning from R

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Keywords: machine learning, predictive modeling, predictive accuracy, scalability, speed
Webpages: https://github.com/szilard/benchm-ml
Binary classification is one of the most widely used machine learning methods in business applications. If the number of features is not very large (sparse), algorithms such as random forests, gradient boosted trees or deep learning neural networks (and ensembles of those) are expected to perform the best in terms of accuracy. There are countless off-the-shelf open source implementations for the previous algorithms (e.g. R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.), but which one to use in practice? Surprisingly, there is a huge variation between even the most commonly used implementations of the same algorithm in terms of scalability, speed, accuracy. In this talk we will see which open source tools work reasonably well on larger datasets commonly encountered in practice. Not surprizingly, all the best tools are available seamlessly from R.


Wednesday July 5, 2017 11:18am - 11:36am CEST
2.02 Wild Gallery