# Reinterpreting Lee-Carter Mortality Model

November 18, 2014
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

Last week, while I was giving my crash course on R for insurance, we’ve been discussing possible extensions of Lee & Carter (1992) model. If we look at the seminal paper, the model is defined as follows

Hence, it means that This would be a (non)linear model on the logarithm of the mortality rate. A non-equivalent, but alternative expression might be

which could be obtained as a Gaussian model, with a log-link function Actually, this model can be compared to the more popular one, introduced in 2002 by Natacha Brouhns, Michel Denuit and Jeroen Vermunt, where a Poisson regression is used to count deaths (with the exposure used as an offset variable) On our datasets

```EXPO <- read.table(
"http://freakonometrics.free.fr/Exposures-France.txt",
"http://freakonometrics.free.fr/Deces-France.txt",
base=data.frame(
D=DEATH\$Total,
E=EXPO\$Total,
X=as.factor(EXPO\$Age),
T=as.factor(EXPO\$Year))
library(gnm)
listeage=c(101:109,"110+")
sousbase=base[! base\$X %in% listeage,]
# on met des nombres car il faut calculer T-X
sousbase\$X=as.numeric(as.character(sousbase\$X))
sousbase\$T=as.numeric(as.character(sousbase\$T))
sousbase\$C=sousbase\$T-sousbase\$X
sousbase\$E=pmax(sousbase\$E,sousbase\$D)```

The codes to fit those models are the following

```LC.gauss <- gnm(D/E~
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
data=sousbase)

LC.gauss.2 <- gnm(log(D/E)~
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
data=sousbase)```

while for the Poisson regression is

```LC.poisson <- gnm(D~offset(log(E))+
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
data=sousbase)```

To visualize the first component, the ‘s, use

```alphaG=coefficients(LC.gauss)[1]+c(0,
coefficients(LC.gauss)[2:101])
s=sd(residuals(LC.gauss.2))

alphaG2=coefficients(LC.gauss.2)[1]+c(0,
coefficients(LC.gauss.2)[2:101])

alphaGw=coefficients(LC.gauss.w)[1]+c(0,
coefficients(LC.gauss.w)[2:101])```

We can then plot them

```plot(0:100,alphaP,col="black",type="l",
xlab="Age")
lines(0:100,alphaG,col="blue")
legend(0,-1,c("Poisson","Gaussian"),
lty=1,col=c("black","blue"))```

On small probabilities, the difference can be considered as substential. But for elderly, it seems that the difference is rather small. Now, the problem with a Poisson model is that it might generate a lot of deaths. Maybe more than the exposure actually. A natural idea is to consider a binomial model (which is a standard model in actuarial textbooks) The codes to run that (non)linear regression would be

```LC.binomiale <- gnm(D/E~
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
weights=E,
data=sousbase)```

One more time, we can visualize the series of ‘s.

```alphaB=coefficients(LC.binomiale)[1]+c(0,
coefficients(LC.binomiale)[2:101])```

Here, the difference is only on old people. For small probabilities, the binomial model can be approximated by a Poisson model. Which is what we observe. On elderly people, there is a large difference, and the Poisson model underestimates the probability of dying. Which makes sense, actually, since the number of deaths has to be smaller than the exposure. A Poisson model with a large parameter will have a (too) large variance. So the model will underestimate the probability. This is what we observe on the right. It is clearly a more realistic fit.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...