More on the great post-1500 migrations

Posted on November 25, 2022 by Two Points Make a Line in R bloggers | 0 Comments

[This article was first published on Two Points Make a Line, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my last post I brought up the World Migration Matrix, an ambitious dataset constructed in 2009 by Louis Putterman and David N. Weil that attempts to trace the ancestral origins of the present-day populations of nearly every country on Earth. It’s a complete matrix, so that you can pick any pair of countries and obtain the share of one country’s ancestors that originated from the other, and vice versa. It’s a deeply fascinating dataset and I thought I’d play around with it some more. (It’s also a chance to familiarize myself further with Highcharts.)

Previously, I plotted the immigrant share of each country, i.e. the share of its current population whose ancestors were not living in that country in the year 1500 (using modern borders). A related question to ask is, what is the ancestral diversity of each country? We encountered cases like Taiwan where nearly all inhabitants have “foreign” ancestors, but since a huge portion come from China, its resulting ancestral diversity is quite low. Contrast that with, say, the United States, where both the immigrant share is high and the ancestral country sources are very diverse.

To quantify this more formally, I use 1 minus the HH index as a measure of ancestral diversity. It takes a bit of data processing so I’m hiding the code below.

Code

library(tidyverse)
library(readxl)
library(countrycode)

mm_raw <- here::here("datasets", "matrix version 1.1.xls") %>% 
  read_excel() %>%
  select(-update) %>%
  pivot_longer(
    cols = !c(wbcode, wbname),
    names_to = "origin",
    values_to = "share"
  )

# Convert to ISO names and codes

mm <- mm_raw %>%
  mutate(
    origin = toupper(origin),
    country_iso3 = countrycode(wbcode, "wb", "iso3c"),
    origin_iso3 = countrycode(origin, "wb", "iso3c")
  )

mm$country_iso3[mm$wbcode == "ZAR"] <- mm$origin_iso3[mm$origin == "ZAR"] <- "COD"
mm$country_iso3[mm$wbcode == "TMP"] <- mm$origin_iso3[mm$origin == "TMP"] <- "TLS"
mm$country_iso3[mm$wbcode == "ROM"] <- mm$origin_iso3[mm$origin == "ROM"] <- "ROU"
mm$country_iso3[mm$wbcode == "OAN"] <- mm$origin_iso3[mm$origin == "OAN"] <- "TWN"

mm <- mm %>%
  mutate(
    country_name = countrycode(country_iso3, "iso3c", "country.name"),
    origin_name = countrycode(origin_iso3, "iso3c", "country.name"),
    country_region = countrycode(country_iso3, "iso3c", "continent"),
    origin_region = countrycode(origin_iso3, "iso3c", "continent")
  ) %>%
  select(country_iso3, country_name, country_region, origin_iso3, origin_name, origin_region, share) %>%
  drop_na()

# Compute immigrant share

mm_immig <- mm %>%
  filter(country_iso3 == origin_iso3) %>%
  mutate(immig = 1 - share) %>%
  select(country_iso3, country_name, country_region, immig)

# Compute ancestral diversity

mm_hhi <- mm %>%
  mutate(share2 = share^2) %>%
  group_by(country_iso3, country_name, country_region) %>%
  summarize(hhi = 1 - sum(share2)) %>%
  ungroup()

mm_immig_hhi <- mm_immig %>%
  left_join(mm_hhi)

The following scatter plots ancestral diversity against the immigrant share of ancestors. They go hand-in-hand up to a point, and then we encounter enormous variety. I was surprised to find that the country with the most diverse set of ancestors is Jamaica, followed very closely by the United States. The other panels below showcase the ancestral origins of Jamaica and the U.S.

Jamaica’s ancestors

Code

jamaica <- mm %>%
  filter(country_name == "Jamaica")

formattp <- JS("function() {
  if (this.point.value < 0.01) {
    return '<b>' + this.point.name + '</b>: <0.01';
  }
  else {
    return '<b>' + this.point.name + '</b>: ' + Highcharts.numberFormat(this.point.value, 2);
  }
}")

hcmap(
  map = "custom/world-highres3",
  data = jamaica,
  name = "origin_name",
  value = "share",
  borderWidth = .5,
  joinBy = c("iso-a3", "origin_iso3")
) %>%
  hc_mapNavigation(enabled = TRUE) %>%
  hc_legend(
    align = "left",
    title = list(text = "Ancestral contribution to Jamaica's population")
  ) %>%
  hc_tooltip(
    headerFormat = "",
    formatter = formattp
  )

To leave a comment for the author, please follow the link and comment on their blog: Two Points Make a Line.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Immigrant share vs diversity

Jamaica’s ancestors

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)