Loading…
useR!2017 has ended
Back To Schedule
Friday, July 7 • 11:00am - 11:18am
naniar: Data structures and functions for consistent exploration of missing data

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
1: Monash University, Department of econometrics and business statistics nicholas.tierney@gmail.com 2: Monash University, Department of econometrics and business statistics dicook@monash.edu 3: Queensland University of Technology, ARC Centre of Excellence for Statistical and Mathematical Frontiers milesmcbain@gmail.com
Keywords
  • Missing Data
  • Exploratory Data analysis
  • Imputation
  • Data Visualization
  • Data Mining
  • Statistical Graphics
Missing values are ubiquitous in data and need to be carefully explored and handled in the initial stages of analysis to avoid bias. However, exploring why and how values are missing is typically an inefficient process. For example, visualising data with missing values in ggplot2 results in omission of missing values with a warning, and base R silently omits missing values Wickham (2009). Additionally, imputed missing data are not typically distinguished in visualisation and data summaries. Tidy data structures described in Wickham (2014) provide an efficient, easy and consistent approach to performing data manipulation and wrangling, where each row is an observation and each column is a variable. There are currently no guidelines for representing missing data structures in a tidy format, nor simple approaches to visualising missing values. This paper describes an R package, naniar, for exploring missing values in data with minimal deviation from the common workflows of ggplot and tidy data. Naniar builds data structures and functions that ensure missing values are handled effectively for plotting and summarising data with missing values, and examining the effects of imputation.
References
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

———. 2014. “Tidy Data.” Journal of Statistical Software 59 (1): 1–23.



Speakers

Friday July 7, 2017 11:00am - 11:18am CEST
2.01 Wild Gallery