useR!2017: Full Schedule

11:00am CEST

**rags2ridges**: A One-Stop-Go for Network Modeling of Precision Matrices

Keywords: Data integration, Graphical modeling, High-dimensional precision matrix estimation; Networks
Webpages: https://CRAN.R-project.org/package=rags2ridges, https://github.com/CFWP/rags2ridges
Contact: cf.peeters@vumc.nl
A contemporary use for inverse covariance matrices (aka precision matrices) is found in the data-based reconstruction of networks through graphical modeling. Graphical models merge probability distributions of random vectors with graphs that express the conditional (in)dependencies between the constituent random variables. The rags2ridges package enables L2-penalized (i.e., ridge) estimation of the precision matrix in settings where the number of variables is large relative to the sample size. Hence, it is a package where high-dimensional (HD) data meets networks.
The talk will give an overview of the rags2ridges package. Specifically, it will show that the package is a one-stop-go as it provides functionality for the extraction, visualization, and analysis of networks from HD data. Moreover, it will show that the package provides a basis for the vertical (across data sets) and horizontal (across platforms) integration of HD data stemming from omics experiments. Last but not least, it will explain why many rap musicians are stating that one should ‘get ridge, or die trying’.
References https://arxiv.org/abs/1509.07982
https://arxiv.org/abs/1608.04123
http://dx.doi.org/10.1016/j.csda.2016.05.012

Speakers

Carel Peeters

Assistant Professor, VU University medical center

Biostatistician specializing in multivariate and high-dimensional molecular biostatistics.

CFWP UseR2017 pdf

Thursday July 6, 2017 11:00am - 11:18am CEST
3.01 Wild Gallery

Talk, Methods I

Company 648

11:18am CEST

Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in *R*

Keywords: clustered data, clustered covariance matrix estimators, object-orientation, simulation, R
Webpages: http://R-forge.R-project.org/projects/sandwich/
Clustered covariances or clustered standard errors are very widely used to account for correlated or clustered data, especially in economics, political sciences, or other social sciences. They are employed to adjust the inference following estimation of a standard least-squares regression or generalized linear model estimated by maximum likelihood. Although many publications just refer to “the” clustered standard errors, there is a surprisingly wide variation in clustered covariances, particularly due to different flavors of bias corrections. Furthermore, while the linear regression model is certainly the most important application case, the same strategies can be employed in more general models (e.g. for zero-inflated, censored, or limited responses).
In R, the sandwich package (Zeileis 2004; Zeileis 2006) provides an object-oriented approach to “robust” covariance matrix estimation based on methods for two generic functions (estfun() and bread()). Using this infrastructure, sandwich covariances for cross-section or time series data have been available for models beyond lm() or glm(), e.g., for packages MASS, pscl, countreg, betareg, among many others. However, corresponding functions for clustered or panel data have been somewhat scattered or available only for certain modeling functions. This shortcoming has been corrected in the development version of sandwich on R-Forge. Here, we introduce this new object-oriented implementation of clustered and panel covariances and assess the methods’ performance in a simulation study.
References Zeileis, Achim. 2004. “Econometric Computing with HC and HAC Covariance Matrix Estimators.” Journal of Statistical Software 11 (10): 1–17. http://www.jstatsoft.org/v11/i10/.

———. 2006. “Object-Oriented Computation of Sandwich Estimators.” Journal of Statistical Software 16 (9): 1–16. http://www.jstatsoft.org/v16/i09/.

Speakers

Susanne Berger

slides vcov169 pdf

Thursday July 6, 2017 11:18am - 11:36am CEST
3.01 Wild Gallery

Talk, Methods I

Company 408

11:36am CEST

factorMerger: a set of tools to support results from post hoc testing

ANOVA-likestatisticaltestsfordifferencesamonggroupsareavailableforalmostahundredyears. But for large number of groups the results from commonly used post-hoc tests are often hard to in- terpret. To deal with this problem, the factorMerger package constructs and plots the hierarchical relation among compared groups. Such hierarchical structure is derived based on the Likelihood Ratio Test and is presented with the Merging Paths Plots created with the ggplot2 package. The cur- rent implementation handles one-dimensional and multi-dimensional Gaussian models as well as binomial and survival models. This article presents the theory and examples for a single-factor use cases.
Package webpage: https://github.com/geneticsMiNIng/FactorMerger
Keywords: analysis of variance (ANOVA), hierarchical clustering, likelihood ratio test (LRT), post
hoc testing

Speakers

Agnieszka Sitko

Data Scientist, Warsaw University of Technology

factorMerger useR (2) pdf

Thursday July 6, 2017 11:36am - 11:54am CEST
3.01 Wild Gallery

Talk, Methods I

11:54am CEST

Estimating the Parameters of a Continuous-Time Markov Chain from Discrete-Time Data with ctmcd

Keywords: Embedding Problem, Generator Matrix, Continuous-Time Markov Chain, Discrete-Time Markov Chain
Webpages: https://CRAN.R-project.org/package=ctmcd
The estimation of the parameters of a continuous-time Markov chain from discrete-time data is an important statistical problem which occurs in a wide range of applications: e.g., with the analysis of gene sequence data, for causal inference in epidemiology, for describing the dynamics of open quantum systems in physics, or in rating based credit risk modeling to name only a few.
The parameters of a continuous-time Markov chain are called generator matrix (also: transition rate matrix or intensity matrix) and the issue of estimating generator matrices from discrete-time data is also known as the embedding problem for Markov chains. For dealing with this missing data situtation, a variety of estimation approaches have been developed. These comprise adjustments of matrix logarithm based candidate solutions of the aggregated discrete-time data, see (Israel, Rosenthal, and Wei 2001) or (Kreinin and Sidelnikova 2001). Moreover, likelihood inference can be conducted by an instance of the expectation-maximization (EM) algorithm and Bayesian inference by a Gibbs sampling procedure based on the conjugate gamma prior distribution (Bladt and Sørensen 2005).
The R package ctmcd (Pfeuffer 2016) is the first publicly available implementation of the approaches listed above. Besides point estimates of generator matrices, the package also contains methods to derive confidence and credibility intervals. The capabilities of the package are illustrated using Standard & Poor’s discrete-time credit rating transition data. Moreover, methodological issues of the described approaches are discussed, i.e., the derivation of the conditional expectations of the E-Step in the EM algorithm and the sampling of endpoint-conditioned continuous-time Markov chain trajectories for the Gibbs sampler.
References Bladt, M., and M. Sørensen. 2005. “Statistical Inference for Discretely Observed Markov Jump Processes.” Journal of the Royal Statistical Society B.

Israel, R. B., J. S. Rosenthal, and J. Z. Wei. 2001. “Finding Generators for Markov Chains via Empirical Transition Matrices, with Applications to Credit Ratings.” Mathematical Finance.

Kreinin, A., and M. Sidelnikova. 2001. “Regularization Algorithms for Transition Matrices.” Algo Research Quarterly.

Pfeuffer, M. 2016. “ctmcd: An R Package for Estimating the Parameters of a Continuous-Time Markov Chain from Discrete-Time Data.” In Revision (the R Journal).

Speakers

Marius Pfeuffer

MariusPfeuffer pdf

Thursday July 6, 2017 11:54am - 12:12pm CEST
3.01 Wild Gallery

Talk, Methods I

Company 978

12:12pm CEST

MCMC Output Analysis Using R package mcmcse

Markov chain Monte Carlo (MCMC) is a method of producing a correlated sample in order to estimate expectations with respect to a target distribution. A fundamental question is when should sampling stop so that we have good estimates of the desired quantities? The key to answering these questions lies in assessing the Monte Carlo error through a multivariate Markov chain central limit theorem. This talk presents the R package mcmcse, which provides estimators for the asymptotic covariance matrix in the Markov chain CLT. In addition, the package calculates a multivariate effective sample size which can be rigorously used to terminate MCMC simulation. I will present the use of the R package mcmcse to conduct robust, valid, and theoretically just output analysis for Markov chain data.

Speakers