Handling missing data with Amelia

December 9, 2012

(This article was first published on is.R(), and kindly contributed to R-bloggers)

So, what if you have data, but some of the observations are missing? Many statistical techniques assume no missingness, so we might want to “fill in” or rectangularize our data, by replacing missing observations with plausible substitutes. There are many ways of going about this, but one of the most robust and accessible is through the Amelia package.

Today’s Gist applies multiple imputation to some sample ANES survey data, and compares listwise-deleted regression results to results pooled from the same regression run on ten imputed data sets. Amelia makes this imputation, modeling, and recombination straightforward, and I’ve thrown in a nice coefficient plot (using position_dodge!) to illustrate the differences between missing data approaches.

To leave a comment for the author, please follow the link and comment on their blog: is.R().

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)