Reinterpreting Lee-Carter Mortality Model

November 18, 2014
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

Last week, while I was giving my crash course on R for insurance, we’ve been discussing possible extensions of Lee & Carter (1992) model. If we look at the seminal paper, the model is defined as follows

Hence, it means that http://latex.codecogs.com/gif.latex?mathbb{E}[log%20mu_{x,t}]%20=alpha_x+beta_xcdot%20kappa_t This would be a (non)linear model on the logarithm of the mortality rate. A non-equivalent, but alternative expression might be

http://latex.codecogs.com/gif.latex?logmathbb{E}[%20mu_{x,t}]%20=alpha_x+beta_xcdot%20kappa_t

which could be obtained as a Gaussian model, with a log-link function http://latex.codecogs.com/gif.latex?%20mu_{x,t}sim{mathcal{N}(e^{alpha_x+beta_xcdot%20kappa_t},sigma^2) Actually, this model can be compared to the more popular one, introduced in 2002 by Natacha Brouhns, Michel Denuit and Jeroen Vermunt, where a Poisson regression is used to count deaths (with the exposure used as an offset variable) http://latex.codecogs.com/gif.latex?%20D_{x,t}sim{mathcal{P}(E_{x,t}cdot%20e^{alpha_x+beta_xcdot%20kappa_t}) On our datasets

EXPO <- read.table(
  "http://freakonometrics.free.fr/Exposures-France.txt",
  header=TRUE,skip=2)
DEATH <- read.table(
  "http://freakonometrics.free.fr/Deces-France.txt",
  header=TRUE,skip=0) ### !!!! 0
base=data.frame(
  D=DEATH$Total,
  E=EXPO$Total,
  X=as.factor(EXPO$Age),
  T=as.factor(EXPO$Year))
library(gnm)
listeage=c(101:109,"110+")
sousbase=base[! base$X %in% listeage,]
 # on met des nombres car il faut calculer T-X
sousbase$X=as.numeric(as.character(sousbase$X))
sousbase$T=as.numeric(as.character(sousbase$T))
sousbase$C=sousbase$T-sousbase$X
sousbase$E=pmax(sousbase$E,sousbase$D)

The codes to fit those models are the following

LC.gauss <- gnm(D/E~
     as.factor(X)+
     Mult(as.factor(X),as.factor(T)),
     family=gaussian(link="log"),
     data=sousbase)

LC.gauss.2 <- gnm(log(D/E)~
      as.factor(X)+
      Mult(as.factor(X),as.factor(T)),
      family=gaussian(link="identity"),
      data=sousbase)

while for the Poisson regression is

LC.poisson <- gnm(D~offset(log(E))+
   as.factor(X)+
   Mult(as.factor(X),as.factor(T)),
   family=poisson(link="log"),
   data=sousbase)

To visualize the first component, the http://latex.codecogs.com/gif.latex?alpha_x‘s, use

alphaG=coefficients(LC.gauss)[1]+c(0,
coefficients(LC.gauss)[2:101])
s=sd(residuals(LC.gauss.2))

alphaG2=coefficients(LC.gauss.2)[1]+c(0,
coefficients(LC.gauss.2)[2:101])

alphaGw=coefficients(LC.gauss.w)[1]+c(0,
coefficients(LC.gauss.w)[2:101])

We can then plot them

plot(0:100,alphaP,col="black",type="l",
xlab="Age")
lines(0:100,alphaG,col="blue")
legend(0,-1,c("Poisson","Gaussian"),
lty=1,col=c("black","blue"))

On small probabilities, the difference can be considered as substential. But for elderly, it seems that the difference is rather small. Now, the problem with a Poisson model is that it might generate a lot of deaths. Maybe more than the exposure actually. A natural idea is to consider a binomial model (which is a standard model in actuarial textbooks) http://latex.codecogs.com/gif.latex?%20D_{x,t}sim{mathcal{B}left(E_{x,t},frac{e^{alpha_x+beta_xcdot%20kappa_t}}{1+e^{alpha_x+beta_xcdot%20kappa_t}}right) The codes to run that (non)linear regression would be

LC.binomiale <- gnm(D/E~
    as.factor(X)+
    Mult(as.factor(X),as.factor(T)),
    weights=E,
    family=binomial(link="logit"),
    data=sousbase)

One more time, we can visualize the series of http://latex.codecogs.com/gif.latex?alpha_x‘s.

alphaB=coefficients(LC.binomiale)[1]+c(0,
coefficients(LC.binomiale)[2:101])

Here, the difference is only on old people. For small probabilities, the binomial model can be approximated by a Poisson model. Which is what we observe. On elderly people, there is a large difference, and the Poisson model underestimates the probability of dying. Which makes sense, actually, since the number of deaths has to be smaller than the exposure. A Poisson model with a large parameter will have a (too) large variance. So the model will underestimate the probability. This is what we observe on the right. It is clearly a more realistic fit.

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)