Last week, while I was giving my crash course on R for insurance, we’ve been discussing possible extensions of Lee & Carter (1992) model. If we look at the seminal paper, the model is defined as follows
Hence, it means that This would be a (non)linear model on the logarithm of the mortality rate. A non-equivalent, but alternative expression might be
which could be obtained as a Gaussian model, with a log-link function Actually, this model can be compared to the more popular one, introduced in 2002 by Natacha Brouhns, Michel Denuit and Jeroen Vermunt, where a Poisson regression is used to count deaths (with the exposure used as an offset variable) On our datasets
EXPO <- read.table( "http://freakonometrics.free.fr/Exposures-France.txt", header=TRUE,skip=2) DEATH <- read.table( "http://freakonometrics.free.fr/Deces-France.txt", header=TRUE,skip=0) ### !!!! 0 base=data.frame( D=DEATH$Total, E=EXPO$Total, X=as.factor(EXPO$Age), T=as.factor(EXPO$Year)) library(gnm) listeage=c(101:109,"110+") sousbase=base[! base$X %in% listeage,] # on met des nombres car il faut calculer T-X sousbase$X=as.numeric(as.character(sousbase$X)) sousbase$T=as.numeric(as.character(sousbase$T)) sousbase$C=sousbase$T-sousbase$X sousbase$E=pmax(sousbase$E,sousbase$D)
The codes to fit those models are the following
LC.gauss <- gnm(D/E~ as.factor(X)+ Mult(as.factor(X),as.factor(T)), family=gaussian(link="log"), data=sousbase) LC.gauss.2 <- gnm(log(D/E)~ as.factor(X)+ Mult(as.factor(X),as.factor(T)), family=gaussian(link="identity"), data=sousbase)
while for the Poisson regression is
LC.poisson <- gnm(D~offset(log(E))+ as.factor(X)+ Mult(as.factor(X),as.factor(T)), family=poisson(link="log"), data=sousbase)
To visualize the first component, the ‘s, use
alphaG=coefficients(LC.gauss)+c(0, coefficients(LC.gauss)[2:101]) s=sd(residuals(LC.gauss.2)) alphaG2=coefficients(LC.gauss.2)+c(0, coefficients(LC.gauss.2)[2:101]) alphaGw=coefficients(LC.gauss.w)+c(0, coefficients(LC.gauss.w)[2:101])
We can then plot them
plot(0:100,alphaP,col="black",type="l", xlab="Age") lines(0:100,alphaG,col="blue") legend(0,-1,c("Poisson","Gaussian"), lty=1,col=c("black","blue"))
On small probabilities, the difference can be considered as substential. But for elderly, it seems that the difference is rather small. Now, the problem with a Poisson model is that it might generate a lot of deaths. Maybe more than the exposure actually. A natural idea is to consider a binomial model (which is a standard model in actuarial textbooks) The codes to run that (non)linear regression would be
LC.binomiale <- gnm(D/E~ as.factor(X)+ Mult(as.factor(X),as.factor(T)), weights=E, family=binomial(link="logit"), data=sousbase)
One more time, we can visualize the series of ‘s.
Here, the difference is only on old people. For small probabilities, the binomial model can be approximated by a Poisson model. Which is what we observe. On elderly people, there is a large difference, and the Poisson model underestimates the probability of dying. Which makes sense, actually, since the number of deaths has to be smaller than the exposure. A Poisson model with a large parameter will have a (too) large variance. So the model will underestimate the probability. This is what we observe on the right. It is clearly a more realistic fit.