A New Baby Boom Poster

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I wanted to work through a few examples of more polished graphics done mostly but perhaps not entirely in R. So, I revisited the Baby Boom visualizations I made a while ago and made a new poster with them. This allowed me to play around with a few packages that I either hadn’t made use of or that weren’t available the first time around. The most notable additions are Rob Hyndman’s suite of tidy tools for time series analysis and Thomas Lin Pedersen’s packages ggforce and patchwork. These are all fantastic resources. The time series decomposition was done with the tsibble family of tools. Meanwhile ggforce and patchwork allow for a tremendous degree of flexibility in laying out multiple plots while still being very straightforward to use. Here’s a preview of the result:

OK boomer

OK Boomer

For now, the annotations were done in post-production (as they say in the movie biz) rather than in R, but I think I’ll be looking to see whether it’s worth taking advantage of some other packages to do those in R as well.

The time series decomposition takes the births series and separates it into trend, seasonal, and remainder components. (It’s an STL decomposition; there are a bunch of other alternatives.) Often, the seasonal and remainder components will end up on quite different scales from the trend. The default plotting methods for decompositions will often show variably-sized vertical bars to the left of each panel, to remind the viewer that the scales are different. But ggforce has a facet_col() function that allows the space taken up by a facet to vary in the same way that one can allow the scales on an ordinary facet’s axes to vary. Usually, variable scaling isn’t desirable in a small-multiple, because the point is to make comparisons across panels. But in this case the combination of free scales and free spacing is very helpful.

Here’s the snippet of code that makes the time series line graphs:

p_trends <- ggplot(data_lon, aes(x = date, y = value)) + 
    geom_line(color = "gray20") + 
    scale_x_date(breaks = break_vec, labels = break_labs, expand = c(0,0)) + 
    facet_col(~ name, space = "free", scales = "free_y") + 
    theme(  strip.background = element_blank(),
            strip.text.x = element_blank()) + 
    labs(y = NULL, x = "Year")

Meanwhile combining the trends plot with the tiled heatmap (called p_tile) is a piece of cake with patchwork:

(p_tile / p_trends) + plot_layout(heights = c(30, 70)) 

The / convention means stack the plot objects, and plot_layout() proportionally divides the available space.

Chances are that I’ll make some posters of these and other recent visualizations. Because people often ask, I’ve been looking into options for making them available for sale in various formats … hopefully that’ll be sorted out soon and I can join e.g. Waterlilies, The Kiss, and John Belushi on dorm room walls everywhere.

The code for the decomposition and the core plots is on GitHub.

To leave a comment for the author, please follow the link and comment on their blog: R on kieranhealy.org.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)