Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I find population pyramids to be very effective teaching tools. In short, a population pyramid is a type of chart that shows the population size of different age cohorts on the x-axis, with gender usually displayed back-to-back to create the shape of a “pyramid.” It is used to illustrate a snapshot of the age and sex structure of a population, and can serve as a tool that aids in discussion of many thematic issues such as population growth, aging, and gender imbalance. The visually appealing structure of the population pyramid also makes it easy to compare the population structures of different countries, which is why I like to use pyramids in my classes. This is exemplified by this post from the World Bank, which shows how different “shapes” of the pyramid reflect different demographic contexts.

I was originally motivated to learn interactive visualization to produce interactive population pyramids. My inspiration came from these examples from Jeff Heer and Mike Bostock, which use data from the U.S. Census. Since then, a number of other creative interactive population pyramid applications have gotten a lot of attention, such as this visualization from Martin De Wulf, and this animated graphic from the Pew Research Center, referenced in The New York Times’ Upshot.

These are all great resources, but I wanted something both re-usable and customizable so that I could embed interactive pyramids for any country I needed in my slide decks for my World Regional Geography course. To do this, I turned to the fantastic International Data Base from the US Census Bureau for data, and the amazing rCharts to create the pyramids. The Census Bureau’s International Data Base contains a tremendous amount of information about basic demographic characteristics of the world’s countries, and is easily downloaded; I’ve had students use it before to create their own population pyramids in Excel for class assignments. However, as I wanted quick access to the data to in turn create interactive rCharts visualizations, I wrote a script, rcharts_pyramids.R, that includes functions to scrape the Data Base website and produce an interactive pyramid with R within seconds.

Below, I’ll discuss how to use the script. I’d like anyone to be able to use this, regardless of their background with R; as such, I’m going to go easy on the technical details. I’ll follow up with another more code-heavy post, and the code is available on GitHub here.

To get started, make sure you have the following R packages installed: XML, reshape2, rCharts, and plyr. rCharts is not on CRAN, so you’ll need to install it from GitHub with the devtools package. This script requires the dev branch of rCharts. If you are just getting started with R, install the packages with the code below.

install.packages(c('XML', 'reshape2', 'devtools', 'plyr'))
library(devtools)
install_github('ramnathv/[email protected]')


Once these packages are installed, you are ready to get started. If you are using RStudio (which I strongly recommend), simply enter the following command into the R Console:

source('https://raw.githubusercontent.com/walkerke/teaching-with-datavis/master/pyramids/rcharts_pyramids.R')


If you are not using RStudio, use source_url from the devtools package instead of source. Alternatively, you can get the script from GitHub yourself and load it.

The script contains four functions: getAgeTable, dPyramid, hPyramid, and nPyramid. getAgeTable is a helper function that uses the XML package to scrape data from the International Data Base and convert it into an R data frame. I pull data from the category “Mid-year Population by Five Year Age Groups and Sex” to create the pyramids. The other three functions are used to create population pyramids from the data with different JavaScript libraries available through rCharts. Each function has three parameters:

1. country (required): the FIPS 10-4 country code for your country of interest. You can find the codes from this Wikipedia page, or use the countrycode R package.
2. year (required): The year for which you want to make the pyramid. The Data Base includes historical information going back to 1950, and projected counts up to 2050 see this link for the Census Bureau’s projection methodology. Not all years are available for all countries, however.
3. colors (optional): A vector of length 2 that contains the colors you want to use for your pyramid. If you leave this argument blank, you’ll get the default colors for your pyramid.

Now, let’s see how this works!

The first example I’ll show you is nPyramid, which uses the NVD3.js library, which is built on top of D3. I’m going to create a population pyramid for Qatar in 2014. I spend a fair amount of time talking about Qatar in my course, as it exemplifies many of the topics we cover such as economic inequality, energy and economic growth, and international labor migration. In fact, around three-quarters of Qatar’s population is foreign-born; further, the conditions endured by migrant laborers in Qatar are attracting international attention in the run-up to the 2022 World Cup.

To create the visualization, simply type the following command into your R console:

nPyramid('QA', 2014, colors = c('darkred', 'silver'))


Which produces the chart below. I have trouble with NVD3 in the RStudio viewer, so you may need to open it in a web browser if you are using RStudio.

The pyramid is striking for the dramatic gender imbalance in Qatar introduced by the influx of foreign laborers to the country. Males aged 30-34 number nearly 300,000; this is over four times the number of females in that age category, who number just over 70,000. Admittedly, population pyramids like this can be problematic for making gender comparisons (Mike Bostock argues as much here), but this is allayed somewhat when using these pyramids interactively, as the visualizations produce a tooltip on hover that returns the precise population count for the age cohort, allowing for comparisons of the numbers.

The next example comes from Highcharts, an interactive JavaScript charting library that includes many different examples and is very well-documented. I’m going to use Highcharts here to create an interactive population pyramid for Japan in 2050 with the hPyramid function. In class, we discussed the challenges Japan is facing as its population ages; this population pyramid provided important visual context. To create the chart, use the code below:

hPyramid('JA', 2050, colors = c('blue', 'red'))


Japan’s projected population structure in 2050 exemplifies an “inverted” population pyramid, with the oldest age cohorts comprising the largest proportions of the population, and the youngest the smallest. There are many reasons why this is projected to happen in Japan, including high life expectancy, restrictive immigration policies, and declining marriage rates (contributing to low fertility levels). In turn, Japan’s shrinking workforce will have to provide for an increasingly large elderly population, which is illustrated in the chart. As with the NVD3 pyramid, the chart gives a tooltip on mouseover that returns specific population figures. One of the most striking features of the pyramid this reveals is the population of female centenarians (people aged 100+), which is projected to exceed 1 million by 2050 as per the Census Bureau.

The last example I’ll show you is dPyramid, which uses the DimpleJS library, built on top of D3. dPyramid works in much the same way as the other functions, but has the added bonus of being able to take advantage of Dimple’s storyboard property, which facilitates the creation of temporal animations. As such, dPyramid accepts a vector of years for the year parameter; if you choose multiple years instead of a single year, dPyramid will give you a population pyramid that changes over time. Here’s an example that shows the aging of Germany’s population between 2000 and 2050. If you are new to R, the seq function returns a sequence of numbers from argument 1 (in this case, the year 2000), to argument 2 (2050), in intervals of argument 3 (every 10 years).

dPyramid('GM', seq(2000, 2050, 10), colors = c('black', 'red'))


The animation does a compelling job of showing how Germany’s population is projected to both age and shrink in the years ahead. Animation can also be used to provide demographic context to key historical circumstances I teach about in class. For example, I covered the devastation wrought by the Khmer Rouge during the Cambodian genocide, and used the animated population pyramid below to illustrate the demographic impacts of the genocide. The chart covers 1974 to 1982, the time just before, during, and in the years following the Khmer Rouge.

dPyramid('CB', seq(1974, 1982, 1))


You can see how the bars shrink between 1975 and 1979, the years when Cambodia was ruled by the Khmer Rouge. I could see the shock on some of my students’ faces when I showed this to them – many of them were unfamiliar with the history of Cambodia and the Khmer Rouge. The visualization shows an overall decline in population of around 900,000 between 1974 and 1979, which reflects official population numbers; this likely underestimates the scale of the atrocities in Cambodia (see the article linked here), as the actual losses may have been twice that.

To save any of your pyramids as a standalone HTML document, just assign your pyramid to a variable and use rCharts’ save option to save your chart, setting the cdn parameter to TRUE. The code below provides an example of how to save the html file in your working directory.

q1 <- nPyramid('QA', 2014, colors = c('darkred', 'silver'))
q1\$save('qatar.html', cdn = TRUE)


I’ve learned a lot about rCharts as I’ve put these together; my next post will cover how these charts are created. In the meantime, have a look at the code on GitHub; you are welcome to use and modify it as you please. I’d love to hear your comments or feedback; you can contact me at [email protected] or get in touch with me on Twitter.

Thanks to:

• Ramnath Vaidyanathan, Timely Portfolio, and John Kiernander for their incredible contributions that make this all possible - and for helping me out with some Dimple formatting issues;
• The authors of the NVD3, D3, and Highcharts JavaScript libraries;
• The US Census Bureau for making the International Data Base such a valuable resource.