**Freakonometrics » R-english**, and kindly contributed to R-bloggers)

Last week, we had a discussion with some colleagues about the fact that – in order to prepare for the SOA exams – we did not have time (so far) to mention results on extreme values in our actuarial program. I did gave an introduction in my *nonlife actuarial models* class, but it was only an introduction, in three hours, in order to illustrate reinsurance pricing. And I told my students that if they wanted to know more about extreme values, they should start a master program in actuarial science and finance, since I will give a course on extremes (and copulas) next winter.

But actually, extreme values are everywhere ! For instance, there is a Prudential TV commercial where has people place large, round stickers on a number line to represent the age of the oldest person they know. This forms some kind of histogram. The message is to have Prudential prepare you to have adequate money for all these years. And actually, anyone can add his or her own sticker at the Prudential website.

Patrick Honner, on his blog (http://mrhonner.com/…), did mention this interesting representation. But this idea is not new, as mentioned in a post, published three years ago. In 1932, Emil Gumbel gave a talk in France on the “*âge limite*“. And as he wrote it “*on peut donc supposer que la distribution de l’âge limite – c’est à dire la probabilité que cet âge ait une valeur donnée – soit Gaussienne*“. In 1932 (not aware of Fisher and Tippett work, he thought that the limiting distribution for a maximum would be Gaussian). But a few years after, he read about Fisher’s work, and observed also that “*la distribution d’une valeur extrêmes peut être représentée pour un nombre suffisant d’observations par la formule doublement exponentielle, pourvu que la distribution initiale se comporte asymptotiquement comme une exponentielle. La formule devient rigoureuse si la distribution initiale est exponentielle*“, as he wrote in 1935. And in 1937, he wrote a paper on “*les centennaires*” that can also be related to the work of Bortkiewicz on *rare events*. One should also mention one of the most important paper in extreme value theory, published in 1974 by Balkema and de Haan, on *Residual Life Time at Great Age*.

Because in this experiment, the question is “*How Old is the Oldest Person You Know?*“, so it is the distribution of a maximum. And from Fisher-Tippett theorem, if we assume that the age is bounded (and that there exists some finite upper limit), then the limiting distribution for the maxima (or to be more rigorous, a affine transformation of the maxima) should be Weibull distribution. And this is what it looks like

> plot(-x,dweibull(x,2.25,4),type="l",lwd=2)

As an actuary, the only thing I know about demography, is the distribution of the age of death. For instance, consider the following French life table

> alive <- read.table( + "http://perso.univ-rennes1.fr/arthur.charpentier/TV8890.csv", + sep=";",header=TRUE)$Lx > nb= -diff(alive) > ages=0:110 > plot(ages,nb,type="h")

This is the distribution of the age of the death in a given population. Which is not the same as the distribution mentioned above! What we look for is the following: given that someone is alive, what could be the distribution of his-her age ? Actually, if we assume that the yearly number of birth is constant with time (as well as death probability), then we can compute easily to number of people of age : we take everyone born (exactly) years ago, and remove all those who died at at , , etc. So the function should be

> probadeath=nb/sum(nb) > nbx=function(x) 1-sum(probadeath[1:(x+1)]) > surv=Vectorize(nbx)(ages) > distrage=surv/sum(surv)

which looks like

But this assumption of constant number of birth is not that relevent. And actually, what we need is the distribution of the age within a population… This is a population pyramid, actually. The French one can be downloaded from http://www.insee.fr/fr/ppp/bases-de-donnees/….

> population <- read.table("popinsee2007.csv",sep=";",header=TRUE)$POPTOT07 > ages=0:107 > plot(ages,population/sum(population),type="h")

(the red line being the one obtained previously, using some natality assumptions). Now, let us use this population to generate acquaintances.

> agemax=function(nsim=1000,size=20){ + agemax=rep(NA,nsim) + for(i in 1:nsim){ + X=sample(ages,prob=population/sum(population),size=size,replace=TRUE) + agemax[i]=max(X)} + return(agemax)}

Here, we assume that everyone knows 20 other people, randomly chosen in the entire population, then we return the age of the oldest. And we do that for 1,000 people. Here is the distribution, we obtain

> XS=agemax(10000,20) > plot(table(XS)/length(XS),type="h",xlim=c(0,108))

where the red line is a Weibull distribution (a transformed one, actually, since in extremely value theory, the distance to the upper bound of the distribution has a Weibull density),

> library(MASS) > fit=fitdistr(108-XS,dweibull,list(shape=1,scale=1)) > lines(ages,dweibull(108-ages,fit$estimate[1],fit$estimate[2]),col="red")

Which is quite close to the distribution obtained in the commercial, don’t you think ? But still, it should be possible to be more accurate, since people should think of their parents, or grandparents. So I guess it could be possible to build a more accurate algorithm, to get something closer to the distribution obtained on the Prudential website. But first, let us wait to have more stickers, more observations… and then I’ll be back to play with it !

**leave a comment**for the author, please follow the link and comment on their blog:

**Freakonometrics » R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...