**free range statistics - R**, and kindly contributed to R-bloggers)

## Life expectancy and death rates

A chance comment on Twitter about historical increases in life expectancy (in the USA) with fairly constant death rates got me wondering about the exact calculations of these things so I did some Googling. Warning on what follows – I am strictly an amateur in demographics, and was deliberately working things out from basics as a learning exercise, so there are possibly better ways to do some of the things below.

According to Wikipedia which is always good at this sort of thing, life expectancy can be calculated two ways:

- by observing a cohort until they are all dead, and taking the average age. Obviously this only works with historical data. This is referred to as “cohort life expectancy at birth”.
- by taking a set of death rates by age bracket, and calculating the average age of death of a hypothetical cohort of people who move through life with those death rates. This is called “period life expectancy at birth” and is the method used for reporting by national statistics offices.

By convention, period life expectancy applies *today’s* death rates to an infant born today. Hence, there’s no estimation of those death rates changing; we assume that by the time this infant is 60 years old, the death rate of 60 year olds in 2078 will be the same as now. This has some rather obvious flaws, but it’s also easy to see why that approach is convenient, particularly for standardising data across countries. It means that the reported life expectancies aren’t really the best estimates of how long a baby born today will live conditional on standard expectations about civilization continuing at all, because we actually expect death rates to continue to decline (although some days I wonder). But it does mean that “life expectancy” is very clearly defined and with minimum discretion needed in the calculation. So long as we remember that period life expectancy is really a summary statistic of today’s death rates by age, not really how long people are expected to live but a sort of hypothetical life length, we are ok.

Life expectancy can go up while crude death rates are also going up (or vice versa) because of changing age composition in a population. All the age-specific death rates (and hence any age-adjusted death rate) might be going down, but if more and more people are in the higher-rate age brackets the overall crude death rate might be increasing. We can see this happening for Japan in the image below:

Here’s the R code that grabbed the data for that image and built it. ggplot2 afficianados might be interested in how I’ve used `geom_blank`

to increase the scales a bit beyond the defaults, which were too narrow and had my text banging against the axes (a common problem). It’s a bit of a hack but it works nicely:

## Calculating life expectancy

To be sure I understood how it worked, I had a go at estimating some life expectancies myself. I started with some French death rates per age group from 2015, because these were the most convenient to hand. Death rates are reported as deaths per 1,000 people per year. Here’s how they look when you convert them into the probability of surviving the year at any particular age:

Here’s how that was drawn. The `french_death_rates_2015`

object comes from the `frs`

package where I store miscellaneous things to help out with this blog.

To actually convert these probabilities into a life expectancy, we need to estimate the proportion of our hypothetical population that will die at each age. There’s lots of different ways you might do this but the one that came to mind to me was:

- Create interpolated death rates for each integer age (because typically death rates are given for a bracket of ages, not every single age)
- Estimate the proportion still alive at any age, starting with 1 at birth and 0 at some arbitrary end point (I chose 150 which seems reasonable). This is the cumulative product of the yearly survival rates, which are of course 1 – death rate (where death rate has been converted to a probability rather than a factor out of 1,000).
- Estimate the difference between those proportions for each year, which gives you the proportion of the total population that died in that year.
- Take the average of all of our ages, weighted by the proportion of the population that died each year.

Now that I write it up this seems a bit more involved than I would have thought. It’s possible there’s a simpler way. Anyway, here’s the function that implements my approach

With my original data it gives plausible results: 85.3 for females and 79.1 for males. These differ minutely from the published figures for France in 2015 of 85.4 and 79.4; I imagine the difference is in how the age brackets are treated. I made some fairly cavalier simplifications in using the middle year of the age bracket as a reference point and interpolating between those points, which on reflection will cause some small problems when there are rapid changes in mortality (the most likely being from the age 0-1 bracket to the age 2-5 bracket).

## Impact on life expectancy of changing a single age groups death rate

It’s interesting to play with how changing death rates at a particular part of the life cycle change the life expectancy calculation. Infant mortality is the biggest driver. The intuition behind this is that everyone who is born alive gets a chance to survive the first year, so an improvement here impacts on the biggest part of our population. If you improve the odds of surviving from 110 to 111 it has minimal impact on life expectancy because most people are already dead at that point.

So here’s what happens if we make infant mortality (ie death per thousand in the first year of life) arbitrarily small or large, using French males in 2015 as a reference point. The blue dot represents the current actual death rate in first year of life and overall life expectancy; the rest of the line shows what happens for hypothetical different death rates (which for France with its low infant mortality, historically and internationally speaking, means higher ones):

Obviously, if we say 1,000 out of 1,000 people die in the first year, our life expectancy becomes zero. More realistically, if death rates in the first year went up to 250 out 1,000 (which would be around the worst current day level, but well within historical ranges), life expectancy comes down from 79 to around 50, *despite* death rates at all other ages staying the same as in 2015 France.

On the other hand, what if we make a spike in deaths at age 18, perhaps due to a strange disease or social custom that makes this a uniquely hazardous age (the the hazard going down to normal levels at age 19). Even if the entire population dies at this age, the life expectancy is still 17 or so; and improvements in mortality rates for 18-year olds accordingly have less relative impact on life expectancy than was the case when we “improved” infant mortality:

For the older population, the issue is more marked again:

Finally, consider the case where a medical advance guarantees a uniform yearly survival rate for anyone who reaches 85 until they turn 150:

Even if we make that survival rate 0 (ie all 85 year olds are guaranteed to reach 150), life expectancy only gets up to about 91.

The code for those simulations is below. It’s a bit repetitive, but with the fiddles I wanted to labels and so on and with limited future re-use expected, it didn’t seem worth writing a function to do this job.

Hmm, ok, interesting. I have some more thoughts about the arithmetic of demography, but they can come in a subsequent post.

**leave a comment**for the author, please follow the link and comment on their blog:

**free range statistics - R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...