Model life tables by @ellis2013nz

[This article was first published on free range statistics - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I am working to improve my knowledge of demography. This is something I’ve only had a relatively superficial engagement with but it’s an important part of the responsibilities of the team where I work.

A key tool in demography is a “life table”, which is basically a table where rows are ages or age groups and columns are different calculations relating to the chance of the cohort in that age group dying in a given year. The cumulative probability of survival lets you directly calculate life expectancy from any given age.

In a country with a very effective statistical system, these life tables can be estimated pretty directly. You take the number of people dying at each age from the death registrations, and the denominator of people per age cohort from the latest population projection.

But if your death registry data is incomplete or your population projection is unreliable, it is a much tougher job. And death registration is incomplete in many countries; enough so that indicator 17.19.2 against the Sustainable Development Goal “Partnerships for the Goals” includes a target for 80% of deaths to be registered; a target that many countries are going to struggle to meet by 2030.

To meet this use case (very common in developing countries), the United Nations provides a set of model life tables:

“The United Nations and the demographic research community at large commonly use two sets of standard model life table families to derive a variety of mortality indicators and underlying mortality patterns for estimation and projection (Coale-Demeny, 1966 and 1989; United Nations, 1982). These two sets of model life tables, designed primarily for use in developing countries or for estimating historic populations, are limited to mortality patterns for a life span from age 20 to 75. A revised set of model life tables, extending the initials sets from life expectancy at birth (e(0)) from age 75.0 up to 92.5, uses both a limited life table as an asymptotic pattern and the classic Lee-Carter approach to derive intermediate age patterns (Buettner, 2002).”

The basic procedure to use these, as I understand it, is that you:

  • get one or more statistics that you *can* measure – like infant or under five mortality (which can be estimated with care from survey or census data) and probability of surviving to age 60 if you are 15
  • use your judgement and demographer community wisdom to decide the “family” of life tables that is most appropriate for your country, and
  • then from that family you pick the particular model life table that most closely matches the statistic or statistics that you do have.

Alternatively it is possible to model the curves (of death rate ~ age) directly with a logistic function of some sort – using the curves of the model tables as a starting point and modifying the parameters according to the statistics available. That takes me beyond today’s scope I think.

In essence, either way, you are relying on a “typical” shape for the mortality at the ages that you can’t measure mortality directly.

These are the families available, grouped under the two types of “Coale-Demeny” or “United Nations”:

  type              family             n
  <chr>             <chr>          <int>
1 CD East           East           21222
2 CD North          North          21222
3 CD South          South          21222
4 CD West           West           21222
5 UN Chilean        Chilean        21222
6 UN Far_East_Asian Far_East_Asian 21222
7 UN General        General        21222
8 UN Latin          Latin          21222
9 UN South_Asian    South_Asian    21222
  

Those values of ‘n’ are the number of rows of data associated with each family. Each family has 81 complete life tables for each of male and female – a life table for each value of “life expectancy at birth” from 20 to 100. The life tables contain mortalities for 131 ages, from 0 to 130. And 81 * 131 * 2 = 21222:

> range(mlt_raw$age)
[1]   0 130
> range(mlt_raw$e0)
[1]  20 100
> 81*131*2
[1] 21222

So this lets us do some interesting comparisons. For example, you can look at the relationship between infant mortality and life expectancy for each family of the model life tables:

Here’s the code for everything so far – downloading the model life tables from the UN, reading them into R, counting the families and drawing that plot of life expectancies:

#-------------functionality and import data----------------
library(tidyverse)
library(readxl)
library(janitor)
library(glue)
library(ggrepel)

download.file("https://www.un.org/en/development/desa/population/publications/pdf/mortality/EMLT/MLT_UN2011_130_1y_complete.xlsx",
              destfile = "model-life-tables.xlsx", mode = "wb")

mlt_raw <- read_excel("model-life-tables.xlsx", sheet = "Sheet1") |>
  clean_names()

#---------------exploration-----------------
mlt_raw  |>
  count(type, family)

# Colors for genders. Not sure I'm happy with these, but I want something that's not blue and pink and
# gives roughly same perceptual prominence to both colours.
pal <- c("brown", "darkblue")
names(pal) <- c("Female", "Male")

# life expectancy plot:
mlt_raw |>
  # filter to age == 0 so we can see life expectancy at birth
  filter(age == 0) |>
  arrange(desc(mx1)) |>
  ggplot(aes(x = mx1, y = e0, colour = sex)) +
  geom_line() +
  facet_wrap(~family) +
  scale_x_sqrt() +
  scale_y_sqrt() +
  scale_colour_manual(values = pal) +
  labs(x = "mx1 for age zero i.e. raw infant mortality",
       y = "life expectancy at birth")

Alternatively we can look at a more summarised version of the data by comparing a couple of particular demographic statistics, at a given life expectancy, for the different model life tables. Here’s my attempt at repeating (but adding in a sex dimension) Figure 1 from this UNFPA instructional site.

I like this plot. It lets you see at a glance how the different families of model life table vary in at least one or two important ways. For example, we can see that the “South” (Coale-Demeny) and “South Asian” (UN) tables have relatively high child mortality (and then of course compensating low adult mortality) for a given life expectancy.

That was done with this code. Note my struggles in the comments with exactly what is meant age 5, age 60 etc. and hence which column to use from the life table; significant expertise with life tables involves understanding the exact ways adjustments are made for things like the difference between age x and the average age when people are age x; the fact that young people die earlier in their first year rather than later; and so on:

# recreate Figure 1 from 
# http://demographicestimation.iussp.org/content/introduction-model-life-tables

# Note that age means "age from x to x + n" where n is the age interval ie. 1
# so age==4 means 4 to 5
# lx1 is the number of people at the beginning of that period. It starts at
# 100000 for age 0. lx1_2 is sort of adjusted for people perhaps in the middle
# of hte period, or some adjustment for different birthdays?
# so taking this together, lx1 when age == 5 will be the number of people alive
# at the beginning of age 5 to age 6
u5m <- mlt_raw |>
  filter(e0 == 60) |>
  filter(age == 5) |>
  group_by(family, sex, type_mlt)  |>
  mutate(under_five = (100000 - lx1) / 100000) |>
  select(family, sex, type_mlt, under_five)

am <- mlt_raw |>
  filter(e0 == 60) |>
  group_by(family, sex, type_mlt) |>
  summarise(adult = 1 - lx1[age == 60] / lx1[age == 15])
  
u5m |>
  left_join(am, by = c("family", "sex", "type_mlt")) |>
  ggplot(aes(x = under_five, y = adult, colour = type_mlt)) +
  facet_wrap(~sex) +
  geom_point() +
  geom_text_repel(aes(label = family), size = 2.7) +
  labs(x = "Under five mortality",
       y = "Adult mortality",
       colour = "Model life table type",
       title = "Comparison of different families of Model Life Table",
       subtitle = "Life expectancy = 60")

OK so that’s a nice representation of the overall life expectancy, but what about the gritty detail of the mortality rates at each individual age? One way to look at this is via an animation:

I quite like this for giving you an overview of the mortality rates of the different families and different life expectancies, but it’s not great for comparing say two different families with the same life expectancy.

The animation was produced with this code:

#------------animation---------
types <- unique(mlt_raw$type)  
e0s <- unique(mlt_raw$e0)


dir.create("tmp_mlt")
for(the_type in types){
  for(e in e0s){
    d <- mlt_raw |>
      filter(type == the_type & e0 == e)
    p <- d |>
      ggplot(aes(x = age, y = mx1, colour = sex)) +
      geom_line() +
      scale_colour_manual(values = pal) +
      labs(x = "Age",
          y = "Death rate",
          colour = "",
          title = glue("Model life table type = {the_type}"),
          subtitle = glue("Life expectancy = {e}")) +
      theme_minimal(base_family = "Calibri") +
      theme(legend.position = c(0.2, 0.8))
    
    png(glue("tmp_mlt/model-{the_type}_le-{e}.png"), 
        width = 1300, height = 800, res = 300, type = "cairo")
      print(p)
    dev.off()
  }
}

# Convert all the single frames into a GIF.
# Requires ImageMagick to be installed. Can uncomment and run it here or do 
# it directly in a system / shell window
projdir <- setwd("tmp_mlt")
system('magick -loop 0 -delay 30 *.png "model-life-tables.gif"')
setwd(projdir)

To better make the comparisons that I felt the animation wasn't good at (particularly family to family of model life table), I made a shiny app. See below, or:

If I had a bit more oomph I would add some tooltips and stuff, but I think I've done enough to feel I'm getting the hang of this thing.

As mentioned earlier, this isn't an area of deep expertise for me. I'm quite likely to have got some details of the terminology wrong, for example, so use the above with caution, and please add comments for anything you spot that I can correct.

To leave a comment for the author, please follow the link and comment on their blog: free range statistics - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)