Reinterpreting Lee-Carter Mortality Model

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week, while I was giving my crash course on R for insurance, we’ve been discussing possible extensions of Lee & Carter (1992) model. If we look at the seminal paper, the model is defined as follows

Hence, it means that http://latex.codecogs.com/gif.latex?mathbb{E}[log%20mu_{x,t}]%20=alpha_x+beta_xcdot%20kappa_t This would be a (non)linear model on the logarithm of the mortality rate. A non-equivalent, but alternative expression might be

http://latex.codecogs.com/gif.latex?logmathbb{E}[%20mu_{x,t}]%20=alpha_x+beta_xcdot%20kappa_t

which could be obtained as a Gaussian model, with a log-link function http://latex.codecogs.com/gif.latex?%20mu_{x,t}sim{mathcal{N}(e^{alpha_x+beta_xcdot%20kappa_t},sigma^2) Actually, this model can be compared to the more popular one, introduced in 2002 by Natacha Brouhns, Michel Denuit and Jeroen Vermunt, where a Poisson regression is used to count deaths (with the exposure used as an offset variable) http://latex.codecogs.com/gif.latex?%20D_{x,t}sim{mathcal{P}(E_{x,t}cdot%20e^{alpha_x+beta_xcdot%20kappa_t}) On our datasets

EXPO <- read.table(
  "http://freakonometrics.free.fr/Exposures-France.txt",
  header=TRUE,skip=2)
DEATH <- read.table(
  "http://freakonometrics.free.fr/Deces-France.txt",
  header=TRUE,skip=0) ### !!!! 0
base=data.frame(
  D=DEATH$Total,
  E=EXPO$Total,
  X=as.factor(EXPO$Age),
  T=as.factor(EXPO$Year))
library(gnm)
listeage=c(101:109,"110+")
sousbase=base[! base$X %in% listeage,]
 # on met des nombres car il faut calculer T-X
sousbase$X=as.numeric(as.character(sousbase$X))
sousbase$T=as.numeric(as.character(sousbase$T))
sousbase$C=sousbase$T-sousbase$X
sousbase$E=pmax(sousbase$E,sousbase$D)

The codes to fit those models are the following

LC.gauss <- gnm(D/E~
     as.factor(X)+
     Mult(as.factor(X),as.factor(T)),
     family=gaussian(link="log"),
     data=sousbase)

LC.gauss.2 <- gnm(log(D/E)~
      as.factor(X)+
      Mult(as.factor(X),as.factor(T)),
      family=gaussian(link="identity"),
      data=sousbase)

while for the Poisson regression is

LC.poisson <- gnm(D~offset(log(E))+
   as.factor(X)+
   Mult(as.factor(X),as.factor(T)),
   family=poisson(link="log"),
   data=sousbase)

To visualize the first component, the http://latex.codecogs.com/gif.latex?alpha_x‘s, use

alphaG=coefficients(LC.gauss)[1]+c(0,
coefficients(LC.gauss)[2:101])
s=sd(residuals(LC.gauss.2))

alphaG2=coefficients(LC.gauss.2)[1]+c(0,
coefficients(LC.gauss.2)[2:101])

alphaGw=coefficients(LC.gauss.w)[1]+c(0,
coefficients(LC.gauss.w)[2:101])

We can then plot them

plot(0:100,alphaP,col="black",type="l",
xlab="Age")
lines(0:100,alphaG,col="blue")
legend(0,-1,c("Poisson","Gaussian"),
lty=1,col=c("black","blue"))

On small probabilities, the difference can be considered as substential. But for elderly, it seems that the difference is rather small. Now, the problem with a Poisson model is that it might generate a lot of deaths. Maybe more than the exposure actually. A natural idea is to consider a binomial model (which is a standard model in actuarial textbooks) http://latex.codecogs.com/gif.latex?%20D_{x,t}sim{mathcal{B}left(E_{x,t},frac{e^{alpha_x+beta_xcdot%20kappa_t}}{1+e^{alpha_x+beta_xcdot%20kappa_t}}right) The codes to run that (non)linear regression would be

LC.binomiale <- gnm(D/E~
    as.factor(X)+
    Mult(as.factor(X),as.factor(T)),
    weights=E,
    family=binomial(link="logit"),
    data=sousbase)

One more time, we can visualize the series of http://latex.codecogs.com/gif.latex?alpha_x‘s.

alphaB=coefficients(LC.binomiale)[1]+c(0,
coefficients(LC.binomiale)[2:101])

Here, the difference is only on old people. For small probabilities, the binomial model can be approximated by a Poisson model. Which is what we observe. On elderly people, there is a large difference, and the Poisson model underestimates the probability of dying. Which makes sense, actually, since the number of deaths has to be smaller than the exposure. A Poisson model with a large parameter will have a (too) large variance. So the model will underestimate the probability. This is what we observe on the right. It is clearly a more realistic fit.

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)