Name: Easy imputation with the simputation package
Start: 2017-07-07T11:18:00+0200
End: 2017-07-07T11:36:00+0200

Back To Schedule

Easy imputation with the simputation package

Feedback form is now closed.

Missing value imputation is a common technique for dealing with missing data. Accordingly, R and its many extension packages offer a wide range of techniques to impute missing data. Imputation can be done using specialized imputation functions or, with a bit of programming, one of the many predictive models available in R or its extension packages.

The current set of available imputation and modeling techniques is the result of decades of development by many different contributors. As a result, imputation and modeling functions may have very different interfaces accross packages. Combining and comparing imputation methods can therefore be a cumbersome task.

The simputation package offers a uniform and robust interface to a number of popular imputation techniques. The package follows the ‘grammar of data manipulation’ (Wickham and Francois 2016), where the first argument to a function and its output are always rectangular datasets. This allows one to chain imputaton methods with the not-a-pipe operator of the magrittr package (Bache and Wickham 2014). In simputation all imputation functions are of the following form.

impute_[model](data, formula, ...)
For example, functions impute_lm or impute_em impute missing values based on linear modeling or EM-estimation respectively. The formula object is interpreted so multiple variables can be imputed based on the same set of predictors. Also, a grouping operator (|) allows one to impute using the split-apply-combine strategy for any imputation method.

Currently supported methods include imputation based on standard linear models, M
-estimation and elasticnet (ridge, lasso) regression; CART and randomForest models; multivariate methods including EM-estimation and iterative randomForest estimation; donor imputation including random and sequential hotdeck, predictive mean matching and kNN
imputation. A flexible interface for simple user-provided imputation expressions is provided as well.

Speakers

Mark van der Loo

user2017markvanderloo pdf

Friday July 7, 2017 11:18am - 11:36am CEST
2.01 Wild Gallery

Talk, Missing Data

Company 186

Attendees (187)

R
F
p
T
T
O
S
K
P
M
L
S
View All →

useR!2017

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Mark van der Loo

Attendees (187)