useR!2017 has ended
Back To Schedule
Thursday, July 6 • 2:06pm - 2:24pm
Performance Benchmarking of the R Programming Environment on Knight's Landing

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Keywords: Multicore architectures, benchmarking, scalability, Xeon Phi
We present performance results obtained with a new performance benchmark of the R programming environment on the Xeon Phi Knights Landing and standard Xeon-based compute nodes. The benchmark package consists of microbenchmarks of matrix linear algebra kernels and machine learning functionality included in the R distribution that can be built from those kernels. Our microbenchmarking results show that the Knights Landing compute nodes exhibited similar or superior performance compared to the standard Xeon-based nodes for matrix dimensions of moderate to large size for most of the microbenchmarks, executing as much as five times faster than the standard Xeon-based nodes. For the clustering and neural network training microbenchmarks, the standard Xeon-based nodes performed up to four times faster than their Xeon Phi counterparts for many large data sets, indicating that commonly used R packages may need to be reengineered to take advantage of existing optimized, scalable kernels.
Over the past several years a trend of increased demand for high performance computing (HPC) in data analysis has emerged. This trend is driven by increasing data sizes and computational complexity(Fox et al. 2015; Kouzes et al. 2009). Many data analysts, researchers, and scientists are turning to HPC machines to help with algorithms and tools, such as machine learning, that are computationally demanding and require large amounts of memory (Raj et al. 2015). The characteristics of large scale machines (e.g. large amounts of RAM per node, high storage capacity, and advanced processing capabilities) appear very attractive to these researchers, however, challenges remain for algorithms to make optimal use of the hardware (Lee et al. 2014). Depending on the nature of the analysis to be performed, analytics workflows may be carried out as many independent concurrent processes requiring little or no coordination between them, or as highly coordinated parallel processes in which the processes perform portions of the same computational task. Regardless of the implementation, it is important for data analysts to have software environments at their disposal which can exploit the performance advantages of modern HPC machines.
A way to assess the performance of software on a given computing platform and inter-compare performance across different platforms is through benchmarking. Benchmark results can also be used to prioritize software performance optimization efforts on emerging HPC systems. One such emerging architecture is the Intel Xeon Phi processor, codenamed Knights Landing (KNL). The latest Intel Xeon Phi processor is a system on a chip, many-core, vector processor with up to 68 cores and two 512-bit vector processing units per core, a sufficient deviation from the standard Xeon processors and Xeon Phi accelerators of the previous generation to necessitate a performance assessment of the R programming environment on KNL.
We developed an R performance benchmark to determine the single-node run time performance of compute intensive linear algebra kernels that are common to many data analytics algorithms, and the run time performance of machine learning functionality commonly implemented with linear algebra operations. We then performed single-node strong scaling tests of the benchmark on both Xeon and Xeon Phi based systems to determine problem sizes and numbers of threads for which the KNL architecture was comparable to or outperformed their standard Intel Xeon counterparts. It is our intention that these results be used to guide future performance optimization efforts of the R programming environment to increase the applicability of HPC machines for compute-intensive data analysis. The benchmark is also generally applicable to a variety of systems and architectures and can be easily run to determine the computational potential of a system when using R for many data analysis tasks.
References Fox, Geoffrey, Judy Qiu, Shantenu Jha, Saliya Ekanayake, and Supun Kamburugamuve. 2015. “Big Data, Simulations and Hpc Convergence.” In Workshop on Big Data Benchmarks, 3–17. Springer.

Kouzes, Richard T, Gordon A Anderson, Stephen T Elbert, Ian Gorton, and Deborah K Gracio. 2009. “The Changing Paradigm of Data-Intensive Computing.” Computer 42 (1). IEEE: 26–34.

Lee, Seunghak, Jin Kyu Kim, Xun Zheng, Qirong Ho, Garth A Gibson, and Eric P Xing. 2014. “On Model Parallelization and Scheduling Strategies for Distributed Machine Learning.” In Advances in Neural Information Processing Systems, 27:2834–42.

Raj, Pethuru, Anupama Raman, Dhivya Nagaraj, and Siddhartha Duggirala. 2015. “High-Performance Big-Data Analytics.” Computing Systems and Approaches (Springer, 2015) 1. Springer.


Thursday July 6, 2017 2:06pm - 2:24pm CEST
2.02 Wild Gallery