An Efficient Algorithm for Solving Large Fixed Effects OLS Problems with Clustered Standard Error Estimation
#### Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

title: An Efficient Algorithm for Solving Large Fixed Effects OLS Problems with Clustered Standard Error Estimation author: | | Thomas Balmat and Jerome Reiter | | Duke University

**Keywords**: large data least squares, fixed effects estimation, clustered standard error estimation, sparse matrix methods, high performance computing

Large fixed effects regression problems, involving order 107 observations and 103 effects levels, present special computational challenges but, also, a special performance opportunity because of the large proportion of entries in the expanded design matrix (fixed effect levels translated from single columns into dichotomous indicator columns, one for each level) that are zero. For many problems, the proportion of zero entries is above 0.99995, which would be considered sparse. In this presentation, we demonstrate an efficient method for solving large, sparse fixed effects OLS problems without creation of the expanded design matrix and avoiding computations involving zero-level effects. This leads to minimal memory usage and optimal execution time. A feature, often desired in social science applications, is to estimate parameter standard errors clustered about a key identifier, such as employee ID. For large problems, with ID counts in the millions, this presents a significant computational challenge. We present a sparse matrix indexing algorithm that produces clustered standard error estimates that, for large fixed effects problems, is many times more efficient than standard “sandwich” matrix operations.

Large fixed effects regression problems, involving order 107 observations and 103 effects levels, present special computational challenges but, also, a special performance opportunity because of the large proportion of entries in the expanded design matrix (fixed effect levels translated from single columns into dichotomous indicator columns, one for each level) that are zero. For many problems, the proportion of zero entries is above 0.99995, which would be considered sparse. In this presentation, we demonstrate an efficient method for solving large, sparse fixed effects OLS problems without creation of the expanded design matrix and avoiding computations involving zero-level effects. This leads to minimal memory usage and optimal execution time. A feature, often desired in social science applications, is to estimate parameter standard errors clustered about a key identifier, such as employee ID. For large problems, with ID counts in the millions, this presents a significant computational challenge. We present a sparse matrix indexing algorithm that produces clustered standard error estimates that, for large fixed effects problems, is many times more efficient than standard “sandwich” matrix operations.