Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few months ago I have made an attempt to visualize the world population changes from 1800 to 2100:

This way of visualization is good to show the ever-changing distribution of the population on a global scale. It allows seeing that, China and India dominated the world population, but also a large share of the world population had lived in Europe in 1800. At the same time, Australia, the Americas, or Africa were not densely populated. By 1900, there is a noticeable growth in relative population in North America (especially the USA). From 1950 onwards, a decrease in population in Europe and a tremendous increase in population in Africa is visible. Based on this prediction, it is also possible to see the relative population of China and India will decrease in the last decades of this century.

However, while we can see changes in proportion, we cannot see one important dimension – population growth on a global scale. Recently, Tomasz Stepinski, Pawel Netzel and I published a paper describing the landscapes change on a global scale between 1992 and 20151 showing how humans impacted our planet just in 24 years time. It is clear the population and its growth has an enormous impact on our planet as it results in more demand for land and natural resources. Therefore, I’ve updated my code to add this dimension…

Now, we can see not only the relative distribution of the population for each year but also population growth on a global scale.

Similarly to the previous post, the rest of this blog post will focus on explaining the steps and code to create the above animation. It involves downloading the data from two sources, cleaning, merging, and preprocessing it. Prepared data can be transformed to cartogram to indicate spatial relations between countries at any given year, and scaled to indicate the global world population at any given year compared to the predicted population for 2100.

## Starting

Let’s start with the packages. If you are new to R, you may want to take a read of this first, which points to various resources for setting-up R for geographic data. When you have a recent R version and the appropriate packages installed (e.g. by executing devtools::install_github("geocompr/geocompkg")) the packages can be attached as follows:

library(sf)             # spatial data classes
library(rnaturalearth)  # world map data
library(dplyr)          # data manipulation
library(stringr)        # data manipulation
library(tidyr)          # data manipulation
library(purrr)          # data manipulation
library(cartogram)      # cartograms creation
library(gganimate)      # animation creation

## Getting data

To create the maps of the world population for each year we will need two datasets – one containing spatial data of the world’s countries and one non-spatial with information about the annual population in the world’s countries. The first one can be easily downloaded from the Natural Earth website, for example using the rnaturalearth package:

world_map = ne_countries(returnclass = "sf")

The second one is available from the Gapminder foundation. Gapminder provides a dataset with population data for all countries and world regions from 1800 to 2100. We can download and read the dataset using the code below:

if(!dir.exists("data")) dir.create("data")
world_pop = read_xlsx("data/pop1800_2100.xlsx", sheet = 7)

## Cleaning

As always when working with multiple datasets – some data cleaning is necessary. Our world_map dataset has many columns irrelevant for cartograms creation and we do not need spatial data of Antarctica. We can also transform our data into a more appropriate projection2.

world_map = world_map %>%
filter(str_detect(type, "country|Country")) %>%
filter(sovereignt != "Antarctica") %>%
select(sovereignt) %>%
st_transform(world_map, crs = "+proj=moll")

We need to have a common identifier to combine our spatial and non-spatial datasets, for example, names of the countries. However, there are inconsistencies between some of the names. We can fix it manually:

world_pop = world_pop %>%
mutate(sovereignt = name) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Tanzania", "United Republic of Tanzania")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "United States", "United States of America")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Congo, Dem. Rep.", "Democratic Republic of the Congo")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Bahamas", "The Bahamas")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Serbia", "Republic of Serbia")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Macedonia, FYR", "Macedonia")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Slovak Republic", "Slovakia")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Congo, Rep.", "Republic of Congo")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Kyrgyz Republic", "Kyrgyzstan")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Lao", "Laos")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Cote d'Ivoire", "Ivory Coast")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Timor-Leste", "East Timor")) %>%
mutate(sovereignt = replace(sovereignt, sovereignt == "Guinea-Bissau", "Guinea Bissau"))

## Preparing

Now we can join our two datasets, remove missing values and unimportant columns. We also should transform the data into a long format3, which is accepted by the plotting packages (such as tmap or ggplot2).

world_data = left_join(world_map, world_pop, by = "sovereignt") %>%
na.omit() %>%
select(-geo, -name, -indicator) %>%
gather(key = "year", value = "population", 1800.0:2100.0,
convert = TRUE)

Additionally, we can calculate total global populations in each year:

world_data = world_data %>%
group_by(year) %>%
mutate(total_pop = sum(as.numeric(population), na.rm = TRUE)) %>%
mutate(title = paste0("Year: ", year, "\nTotal population (billions): ",
round(total_pop / 1e9, 2))) %>%
ungroup()

## Subsetting

Now, our data contains information about the world population for each year between 1800 and 2100. It is possible to use it to create cartograms, however, to reduce calculation time and simplification of the results we will only use data for every 25 years:

world_data = world_data %>%
filter(year %in% seq(1800, 2100, by = 25))

## Transforming

With newly created data, we are able to create our cartograms. We need to split our data into independent annual datasets, create cartograms based on the population variable, and combine all of the cartograms back into one object.

world_data_carto = world_data %>%
split(.\$year) %>%
map(cartogram_cont, "population", maxSizeError = 2, threshold = 0.1) %>%
do.call(rbind, .)

## Scaling

Predicted global population is the largest for the year 2100 and consists of more than 11 billion human beings. We can resize a map for each year using this value as our reference point, where the rest of the maps would be proportionally smaller than the one for 2100. To scale the areas of the other maps we need to divide a total population in each year by the maximum value and compute the square root of the result. The last step of calculations is to resize the geometries (coordinates) by multiplying it with the scales values4:

world_data_scaled = world_data_carto %>%
mutate(scales = sqrt(total_pop / max(total_pop))) %>%
group_by(year) %>%
mutate(geometry = geometry * scales)

For more on the topic, we recommend checking out the Affine transformations section of Geocomputation with R.

## Visualizing

Now when we have all of the pieces needed, the animated map can be created using either tmap5 or gganimate. It consists of three steps:

1. Creating maps for each year using a combination of the ggplot(), geom_sf(), and transition_states() functions.
2. Rendering an animation using animate(). It allows for changing the size of a resulting image, and the speed of animation – you can control it with any two combinations of nframes, fps, and duration.
3. Saving an animation with anim_save().
worlds_anim = ggplot() +
geom_sf(data = world_data_scaled, aes(fill = population / 1000000), lwd = 0.1) +
coord_sf(datum = NA) +
scale_fill_viridis_c(name = "Population (mln)") +
theme_minimal() +
theme(
plot.title = element_text(size = 22),
plot.background = element_rect(fill = "#f5f5f4", color = NA),
panel.background = element_rect(fill = "#f5f5f4", color = NA),
legend.background = element_rect(fill = "#f5f5f4", color = NA),
legend.position = c(0.1, 0.2),
legend.title = element_text(size = 18)
) +
labs(title = "{closest_state}") +
transition_states(title, transition_length = 2, state_length = 6) +
ease_aes("cubic-in-out")

worlds_animate = animate(worlds_anim, width = 1200, height = 550,
duration = 30, fps = 4)
anim_save("worlds_animate.gif", animation = worlds_animate)

2. You can read more about projections in the Reprojecting geographic data chapter of Geocomputation with R.

3. This process is called gathering.

4. Mutating of geometry columns is possible with the recent improvements in the sf package.

5. Visit the animated maps chapter of Geocomputation with R or my previous blog post to learn how to make animations with tmap.