Loading…
useR!2017 has ended
Friday, July 7 • 11:18am - 11:36am
Easy imputation with the simputation package

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Missing value imputation is a common technique for dealing with missing data. Accordingly, R and its many extension packages offer a wide range of techniques to impute missing data. Imputation can be done using specialized imputation functions or, with a bit of programming, one of the many predictive models available in R or its extension packages.

The current set of available imputation and modeling techniques is the result of decades of development by many different contributors. As a result, imputation and modeling functions may have very different interfaces accross packages. Combining and comparing imputation methods can therefore be a cumbersome task.

The simputation package offers a uniform and robust interface to a number of popular imputation techniques. The package follows the ‘grammar of data manipulation’ (Wickham and Francois 2016), where the first argument to a function and its output are always rectangular datasets. This allows one to chain imputaton methods with the not-a-pipe operator of the magrittr package (Bache and Wickham 2014). In simputation all imputation functions are of the following form.

impute_[model](data, formula, ...)
For example, functions impute_lm or impute_em impute missing values based on linear modeling or EM-estimation respectively. The formula object is interpreted so multiple variables can be imputed based on the same set of predictors. Also, a grouping operator (|) allows one to impute using the split-apply-combine strategy for any imputation method.

Currently supported methods include imputation based on standard linear models, M
-estimation and elasticnet (ridge, lasso) regression; CART and randomForest models; multivariate methods including EM-estimation and iterative randomForest estimation; donor imputation including random and sequential hotdeck, predictive mean matching and kNN
imputation. A flexible interface for simple user-provided imputation expressions is provided as well.

Speakers


Friday July 7, 2017 11:18am - 11:36am CEST
2.01 Wild Gallery