**Freakonometrics » R-english**, and kindly contributed to R-bloggers)

Last week, while I was giving my crash course on R for insurance, we’ve been discussing possible extensions of Lee & Carter (1992) model. If we look at the seminal paper, the model is defined as follows

Hence, it means that This would be a (non)linear model on the logarithm of the mortality rate. A non-equivalent, but alternative expression might be

which could be obtained as a Gaussian model, with a log-link function Actually, this model can be compared to the more popular one, introduced in 2002 by Natacha Brouhns, Michel Denuit and Jeroen Vermunt, where a Poisson regression is used to count deaths (with the exposure used as an offset variable) On our datasets

EXPO <- read.table( "http://freakonometrics.free.fr/Exposures-France.txt", header=TRUE,skip=2) DEATH <- read.table( "http://freakonometrics.free.fr/Deces-France.txt", header=TRUE,skip=0) ### !!!! 0 base=data.frame( D=DEATH$Total, E=EXPO$Total, X=as.factor(EXPO$Age), T=as.factor(EXPO$Year)) library(gnm) listeage=c(101:109,"110+") sousbase=base[! base$X %in% listeage,] # on met des nombres car il faut calculer T-X sousbase$X=as.numeric(as.character(sousbase$X)) sousbase$T=as.numeric(as.character(sousbase$T)) sousbase$C=sousbase$T-sousbase$X sousbase$E=pmax(sousbase$E,sousbase$D)

The codes to fit those models are the following

LC.gauss <- gnm(D/E~ as.factor(X)+ Mult(as.factor(X),as.factor(T)), family=gaussian(link="log"), data=sousbase) LC.gauss.2 <- gnm(log(D/E)~ as.factor(X)+ Mult(as.factor(X),as.factor(T)), family=gaussian(link="identity"), data=sousbase)

while for the Poisson regression is

LC.poisson <- gnm(D~offset(log(E))+ as.factor(X)+ Mult(as.factor(X),as.factor(T)), family=poisson(link="log"), data=sousbase)

To visualize the first component, the ‘s, use

alphaG=coefficients(LC.gauss)[1]+c(0, coefficients(LC.gauss)[2:101]) s=sd(residuals(LC.gauss.2)) alphaG2=coefficients(LC.gauss.2)[1]+c(0, coefficients(LC.gauss.2)[2:101]) alphaGw=coefficients(LC.gauss.w)[1]+c(0, coefficients(LC.gauss.w)[2:101])

We can then plot them

plot(0:100,alphaP,col="black",type="l", xlab="Age") lines(0:100,alphaG,col="blue") legend(0,-1,c("Poisson","Gaussian"), lty=1,col=c("black","blue"))

On small probabilities, the difference can be considered as substential. But for elderly, it seems that the difference is rather small. Now, the problem with a Poisson model is that it might generate a lot of deaths. Maybe more than the exposure actually. A natural idea is to consider a binomial model (which is a standard model in actuarial textbooks) The codes to run that (non)linear regression would be

LC.binomiale <- gnm(D/E~ as.factor(X)+ Mult(as.factor(X),as.factor(T)), weights=E, family=binomial(link="logit"), data=sousbase)

One more time, we can visualize the series of ‘s.

alphaB=coefficients(LC.binomiale)[1]+c(0, coefficients(LC.binomiale)[2:101])

Here, the difference is only on old people. For small probabilities, the binomial model can be approximated by a Poisson model. Which is what we observe. On elderly people, there is a large difference, and the Poisson model underestimates the probability of dying. Which makes sense, actually, since the number of deaths *has to *be smaller than the exposure. A Poisson model with a large parameter will have a (too) large variance. So the model will underestimate the probability. This is what we observe on the right. It is clearly a more realistic fit.

**leave a comment**for the author, please follow the link and comment on their blog:

**Freakonometrics » R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...