# Reinterpreting Lee-Carter Mortality Model

November 18, 2014
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

Last week, while I was giving my crash course on R for insurance, we’ve been discussing possible extensions of Lee & Carter (1992) model. If we look at the seminal paper, the model is defined as follows

Hence, it means that This would be a (non)linear model on the logarithm of the mortality rate. A non-equivalent, but alternative expression might be

which could be obtained as a Gaussian model, with a log-link function Actually, this model can be compared to the more popular one, introduced in 2002 by Natacha Brouhns, Michel Denuit and Jeroen Vermunt, where a Poisson regression is used to count deaths (with the exposure used as an offset variable) On our datasets

```EXPO <- read.table(
"http://freakonometrics.free.fr/Exposures-France.txt",
"http://freakonometrics.free.fr/Deces-France.txt",
header=TRUE,skip=0) ### !!!! 0
base=data.frame(
D=DEATH\$Total,
E=EXPO\$Total,
X=as.factor(EXPO\$Age),
T=as.factor(EXPO\$Year))
library(gnm)
listeage=c(101:109,"110+")
sousbase=base[! base\$X %in% listeage,]
# on met des nombres car il faut calculer T-X
sousbase\$X=as.numeric(as.character(sousbase\$X))
sousbase\$T=as.numeric(as.character(sousbase\$T))
sousbase\$C=sousbase\$T-sousbase\$X
sousbase\$E=pmax(sousbase\$E,sousbase\$D)```

The codes to fit those models are the following

```LC.gauss <- gnm(D/E~
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
data=sousbase)

LC.gauss.2 <- gnm(log(D/E)~
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
data=sousbase)```

while for the Poisson regression is

```LC.poisson <- gnm(D~offset(log(E))+
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
data=sousbase)```

To visualize the first component, the ‘s, use

```alphaG=coefficients(LC.gauss)[1]+c(0,
coefficients(LC.gauss)[2:101])
s=sd(residuals(LC.gauss.2))

alphaG2=coefficients(LC.gauss.2)[1]+c(0,
coefficients(LC.gauss.2)[2:101])

alphaGw=coefficients(LC.gauss.w)[1]+c(0,
coefficients(LC.gauss.w)[2:101])```

We can then plot them

```plot(0:100,alphaP,col="black",type="l",
xlab="Age")
lines(0:100,alphaG,col="blue")
legend(0,-1,c("Poisson","Gaussian"),
lty=1,col=c("black","blue"))```

On small probabilities, the difference can be considered as substential. But for elderly, it seems that the difference is rather small. Now, the problem with a Poisson model is that it might generate a lot of deaths. Maybe more than the exposure actually. A natural idea is to consider a binomial model (which is a standard model in actuarial textbooks) The codes to run that (non)linear regression would be

```LC.binomiale <- gnm(D/E~
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
weights=E,
data=sousbase)```

One more time, we can visualize the series of ‘s.

```alphaB=coefficients(LC.binomiale)[1]+c(0,
coefficients(LC.binomiale)[2:101])```

Here, the difference is only on old people. For small probabilities, the binomial model can be approximated by a Poisson model. Which is what we observe. On elderly people, there is a large difference, and the Poisson model underestimates the probability of dying. Which makes sense, actually, since the number of deaths has to be smaller than the exposure. A Poisson model with a large parameter will have a (too) large variance. So the model will underestimate the probability. This is what we observe on the right. It is clearly a more realistic fit.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...