This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Friday, July 7 • 11:18am - 11:36am
Easy imputation with the simputation package

Sign up or log in to save this to your schedule and see who's attending!

Feedback form is now closed.
Missing value imputation is a common technique for dealing with missing data. Accordingly, R and its many extension packages offer a wide range of techniques to impute missing data. Imputation can be done using specialized imputation functions or, with a bit of programming, one of the many predictive models available in R or its extension packages.

The current set of available imputation and modeling techniques is the result of decades of development by many different contributors. As a result, imputation and modeling functions may have very different interfaces accross packages. Combining and comparing imputation methods can therefore be a cumbersome task.

The simputation package offers a uniform and robust interface to a number of popular imputation techniques. The package follows the ‘grammar of data manipulation’ (Wickham and Francois 2016), where the first argument to a function and its output are always rectangular datasets. This allows one to chain imputaton methods with the not-a-pipe operator of the magrittr package (Bache and Wickham 2014). In simputation all imputation functions are of the following form.

impute_[model](data, formula, ...)
For example, functions impute_lm or impute_em impute missing values based on linear modeling or EM-estimation respectively. The formula object is interpreted so multiple variables can be imputed based on the same set of predictors. Also, a grouping operator (|) allows one to impute using the split-apply-combine strategy for any imputation method.

Currently supported methods include imputation based on standard linear models, M
-estimation and elasticnet (ridge, lasso) regression; CART and randomForest models; multivariate methods including EM-estimation and iterative randomForest estimation; donor imputation including random and sequential hotdeck, predictive mean matching and kNN
imputation. A flexible interface for simple user-provided imputation expressions is provided as well.


Friday July 7, 2017 11:18am - 11:36am
2.01 Wild Gallery

Attendees (189)