U.S. Census Counts Data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As promised previously, I packaged up the U.S. Census data that I pulled together to make the population density and pyramid animations. The package is called uscenpops and it’s available to install via GitHub or with install.packages()
if you set up drat first. The instructions are on the package homepage.

A small multiple plot of selected population pyramids
Instead of an animation, let’s make the less-flashy but, frankly, in all likelihood more useful small multiple plot seen here. With the package installed we can produce it as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
library(tidyverse) library(uscenpops) uscenpops #> # A tibble: 10,520 x 5 #> year age pop male female #> |
That’s what the dataset looks like. We’ll lengthen it, calculate a relative frequency (that we won’t use in this particular plot) and add a base value that we’ll use for the ribbon boundaries below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
pop_pyr <- uscenpops %>% select(year, age, male, female) %>% pivot_longer(male:female, names_to = "group", values_to = "count") %>% group_by(year, group) %>% mutate(total = sum(count), pct = (count/total)*100, base = 0) pop_pyr #> # A tibble: 21,040 x 7 #> # Groups: year, group [240] #> year age group count total pct base #> |
Next we set up some little vectors of labels and colors, and then make a mini-dataframe of what we’ll use as labels in the plot area, rather than using the default strip labels in facet_wrap()
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
## Axis labels mbreaks <- c("1M", "2M", "3M") ## colors pop_colors <- c("#E69F00", "#0072B2") ## In-plot year labels dat_text <- data.frame( label = c(seq(1900, 2015, 5), 2019), year = c(seq(1900, 2015, 5), 2019), age = rep(95, 25), count = rep(-2.75e6, 25) ) |
As before, the trick to making the pyramid is to set all the values for one category (here, males) to negative numbers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
pop_pyr$count[pop_pyr$group == "male"] <- -pop_pyr$count[pop_pyr$group == "male"] p <- pop_pyr %>% filter(year %in% c(seq(1900, 2015, 5), 2019)) %>% ggplot(mapping = aes(x = age, ymin = base, ymax = count, fill = group)) p + geom_ribbon(alpha = 0.9, color = "black", size = 0.1) + geom_label(data = dat_text, mapping = aes(x = age, y = count, label = label), inherit.aes = FALSE, vjust = "inward", hjust = "inward", fontface = "bold", color = "gray40", fill = "gray95") + scale_y_continuous(labels = c(rev(mbreaks), "0", mbreaks), breaks = seq(-3e6, 3e6, 1e6), limits = c(-3e6, 3e6)) + scale_x_continuous(breaks = seq(10, 100, 10)) + scale_fill_manual(values = pop_colors, labels = c("Females", "Males")) + guides(fill = guide_legend(reverse = TRUE)) + labs(x = "Age", y = "Population in Millions", title = "Age Distribution of the U.S. Population, 1900-2019", subtitle = "Age is top-coded at 75 until 1939, at 85 until 1979, and at 100 since then", caption = "Kieran Healy / kieranhealy.org / Data: US Census Bureau.", fill = "") + theme(legend.position = "bottom", plot.title = element_text(size = rel(2), face = "bold"), strip.background = element_blank(), strip.text.x = element_blank()) + coord_flip() + facet_wrap(~ year, ncol = 5) |
The calls to geom_ribbon()
and geom_label()
draw the actual plots, and everything else is just a little attention to detail in order to make it come out nicely.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.