Us Monthly Births

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Yesterday I came across Aaron Penne’s collection of very nice data visualizations, one of which was of monthly births in the United States since 1933. He made a tiled heatmap of the data, taking care when calculating the average rate to correct for the varying number of days in different months. Aaron works in Python, so I took the opportunity to play around with the data and redo the plots in R. The code and data are on Github.

First, here’s the monthly heatmap. I use one of the Viridis palettes, as it has very nice contrast properties.

Monthy births

Monthly births in the United Dates, 1933-2015.

While we’re at it, we can also make a version of this plot that’s oriented vertically. This might be nice to view on a portrait-oriented screen like a tablet or phone

Monthy births, vertical

Monthly births in the United Dates, 1933-2015, vertically oriented.

Fancy tiled heatmaps are nice, but it can be hard to beat an old-fashioned time-series:

Monthy births, time series

Monthly births in the United Dates, 1933-2015. Time series plot.

In this version you can really see that the Baby Boom has two beginnings—about nine months after December 1941, when the US joined the war, and then in earnest about nine months after December 1945.

With time-series data like this, William Cleveland’s LOESS-based decomposition often gives nice results, so I ran that on the data. (It’s in R as the stl() function). The series gets decomposed into trend, seasonal, and remainder components. Here’s what that looks like.

Monthy births, time series decomposition

Monthly births in the United Dates, 1933-2015. Time series decomposition.

When interpreting the decomposition, bear in mind that each panel has its own y-axis, appropriate to the range of the element being plotted. This means the Data and Trend panels are more or less comparable, but the Seasonal and Remainder components are on different (and much narrower) scales.

To leave a comment for the author, please follow the link and comment on their blog: R on kieranhealy.org.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)