Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week, while I was giving my crash course on R for insurance, we’ve been discussing possible extensions of Lee & Carter (1992) model. If we look at the seminal paper, the model is defined as follows Hence, it means that This would be a (non)linear model on the logarithm of the mortality rate. A non-equivalent, but alternative expression might be

which could be obtained as a Gaussian model, with a log-link function Actually, this model can be compared to the more popular one, introduced in 2002 by Natacha Brouhns, Michel Denuit and Jeroen Vermunt, where a Poisson regression is used to count deaths (with the exposure used as an offset variable) On our datasets

EXPO <- read.table(
"http://freakonometrics.free.fr/Exposures-France.txt",
"http://freakonometrics.free.fr/Deces-France.txt",
base=data.frame(
D=DEATH$Total, E=EXPO$Total,
X=as.factor(EXPO$Age), T=as.factor(EXPO$Year))
library(gnm)
listeage=c(101:109,"110+")
sousbase=base[! base$X %in% listeage,] # on met des nombres car il faut calculer T-X sousbase$X=as.numeric(as.character(sousbase$X)) sousbase$T=as.numeric(as.character(sousbase$T)) sousbase$C=sousbase$T-sousbase$X
sousbase$E=pmax(sousbase$E,sousbase\$D)

The codes to fit those models are the following

LC.gauss <- gnm(D/E~
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
data=sousbase)

LC.gauss.2 <- gnm(log(D/E)~
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
data=sousbase)

while for the Poisson regression is

LC.poisson <- gnm(D~offset(log(E))+
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
data=sousbase)

To visualize the first component, the ‘s, use

alphaG=coefficients(LC.gauss)+c(0,
coefficients(LC.gauss)[2:101])
s=sd(residuals(LC.gauss.2))

alphaG2=coefficients(LC.gauss.2)+c(0,
coefficients(LC.gauss.2)[2:101])

alphaGw=coefficients(LC.gauss.w)+c(0,
coefficients(LC.gauss.w)[2:101])

We can then plot them

plot(0:100,alphaP,col="black",type="l",
xlab="Age")
lines(0:100,alphaG,col="blue")
legend(0,-1,c("Poisson","Gaussian"),
lty=1,col=c("black","blue")) On small probabilities, the difference can be considered as substential. But for elderly, it seems that the difference is rather small. Now, the problem with a Poisson model is that it might generate a lot of deaths. Maybe more than the exposure actually. A natural idea is to consider a binomial model (which is a standard model in actuarial textbooks) The codes to run that (non)linear regression would be

LC.binomiale <- gnm(D/E~
as.factor(X)+
Mult(as.factor(X),as.factor(T)),
weights=E,
data=sousbase)

One more time, we can visualize the series of ‘s.

alphaB=coefficients(LC.binomiale)+c(0,
coefficients(LC.binomiale)[2:101]) Here, the difference is only on old people. For small probabilities, the binomial model can be approximated by a Poisson model. Which is what we observe. On elderly people, there is a large difference, and the Poisson model underestimates the probability of dying. Which makes sense, actually, since the number of deaths has to be smaller than the exposure. A Poisson model with a large parameter will have a (too) large variance. So the model will underestimate the probability. This is what we observe on the right. It is clearly a more realistic fit.