ggplot your missing data

[This article was first published on njtierney - rbloggers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Visualising missing data is important when analysing a dataset. I wanted to make a plot of the presence/absence in a dataset. One package, Amelia provides a function to do this, but I don’t like the way it looks. So I made a ggplot version of what it did.

Let’s make a dataset using the awesome wakefield package, and add random missingness.

df <- 
  n = 30,
  Scoring = rnorm,
  Smoker = valid
  ) %>%

This is what the Amelia package produces by default:



plot of chunk unnamed-chunk-2

And let’s explore the missing data using my own ggplot function:

# A function that plots missingness
# requires `reshape2`


ggplot_missing <- function(x){
  x %>% %>%
    melt %>%
    ggplot(data = .,
           aes(x = X2,
               y = X1)) +
    geom_raster(aes(fill = value)) +
    scale_fill_grey(name = "",
                    labels = c("Present","Missing")) +
    theme_minimal() + 
    theme(axis.text.x  = element_text(angle=45, vjust=0.5)) + 
    labs(x = "Variables in Dataset",
         y = "Rows / observations")

Let’s test it out


plot of chunk unnamed-chunk-4

It’s much cleaner, and easier to interpret.

This function, and others, is available in the neato package, where I store a bunch of functions I think are neat.

Quick note – there used to be a function, missing.pattern.plot that you can see here in the package mi. However, it doesn’t appear to exist anymore. This is a shame, as it was a really nifty plot that clustered the groups of missingness. My friend and colleague, Sam Clifford heard me complaining about this and wrote some code that does just that – I shall share this soon, it will likely be added to the neato repository.

Thoughts? Write them below.

To leave a comment for the author, please follow the link and comment on their blog: njtierney - rbloggers. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)