Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Thursday, July 6 • 5:55pm - 6:00pm
The R package bigstatsr: Memory- and Computation-Efficient Statistical Tools for Big Matrices

Sign up or log in to save this to your schedule and see who's attending!

Keywords: Statistics, Big Data, Memory-mapping, Parallelism
Webpages: https://github.com/privefl/bigstatsr
The R package bigstatsr provides functions for fast statistical analysis of large-scale data encoded as matrices. The package can handle matrices that are too large to fit in memory. The package bigstatsr is based on the format big.matrix provided by the R package bigmemory (Kane, Emerson, and Weston 2013).
The package bigstatsr enables users with laptop to perform statistical analysis of several dozens of gigabytes of data. The package is fast and efficient because of four different reasons. First, bigstatsr is memory-efficient because it uses only small chunks of data at a time. Second, special care has been taken to implement effective algorithms. Third, big.matrix objects use memory-mapping, which provides efficient accesses to matrices. Finally, as matrices are stored on-disk, many processes can easily access them in parallel.
The main features currently available in bigstatsr are:
  • singular value decomposition (SVD) and randomized partial SVD (Lehoucq and Sorensen 1996),
  • sparse linear and logistic regressions (Zeng and Breheny 2017),
  • sparse linear Support Vector Machines,
  • column-wise linear and logistic regressions tests,
  • matrix operations,
  • parallelization / apply.
References Kane, Michael J, John W Emerson, and Stephen Weston. 2013. “Scalable Strategies for Computing with Massive Data.” Journal of Statistical Software 55 (14): 1–19. doi:10.18637/jss.v055.i14.

Lehoucq, Rich Bruno, and D. C. Sorensen. 1996. “Deflation Techniques for an Implicitly Restarted Arnoldi Iteration.” SIAM Journal on Matrix Analysis and Applications 17 (4). Society for Industrial; Applied Mathematics: 789–821. doi:10.1137/S0895479895281484.

Zeng, Yaohui, and Patrick Breheny. 2017. “The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in R,” January. http://arxiv.org/abs/1701.05936.




Speakers


Thursday July 6, 2017 5:55pm - 6:00pm
3.01 Wild Gallery

Attendees (83)