**Freakonometrics » R-english**, and kindly contributed to R-bloggers)

A few months ago, I did published a post on negative values in triangles, and how to deal with them, when using a Poisson regression (the post was published in French). The idea was to use a **translation technique**:

- Fit a model not on ‘s but on , for some ,
- Use that model to make predictions, and then translate those predictions,

This is what was done to get the following graph, where a Poisson regression was fitted. Black points are ‘s while blue points are ‘s, for some . We fit a model to get the blue prediction, and then translate it to get the red prediction (on the ‘s).

In this example, there were no negative values, but it is possible to use it get a better understanding on the impact of this technique. The prediction, here, is the red line. And clearly, the value of has an impact on the prediction (since we do not consider, here, a linear model: with a linear model, translating has not impact at all, except on the intercept).

The alternative mentioned in the previous post was to use this technique on several ‘s, and them interpolate

- For a given , fit a model not on ‘s but on , use that model to make predictions, and then translate those predictions, .
- Do it for several ‘s.
- Use it to extrapolate when is (which is the case we are interested in).

In the context of loss reserving, the idea is extremely simple. Consider a triangle with incremental payments

> source("http://perso.univ-rennes1.fr/arthur.charpentier/bases.R") > Y=T=PAID > n=ncol(T) > Y[,2:n]=T[,2:n]-T[,1:(n-1)] > Y [,1] [,2] [,3] [,4] [,5] [,6] [1,] 3209 1163 39 17 7 21 [2,] 3367 1292 37 24 10 NA [3,] 3871 1474 53 22 NA NA [4,] 4239 1678 103 NA NA NA [5,] 4929 1865 NA NA NA NA [6,] 5217 NA NA NA NA NA

Now, we do not have negative values, here, but we can still see is translation techniques can be used. The benchmark is the Poisson regression, since we can run it :

> y=as.vector(as.matrix(Y)) > base=data.frame(y,ai=rep(2000:2005,n),bj=rep(0:(n-1),each=n)) > reg=glm(y~as.factor(ai)+as.factor(bj),data=base,family=poisson)

Here, the amount is reserve is the sum of predicted values in the lower part of the triangle,

> py=predict(reg,newdata=base,type="response") > sum(py[is.na(base$y)]) [1] 2426.985

which is exactly Chain Ladder’s estimate.

Now, let us use a translation technique to compute the amount of reserves. The code will be

> decal=function(k){ + reg=glm(y+k~as.factor(ai)+as.factor(bj),data=base,family=poisson) + py=predict(reg,newdata=base,type="response") + return(sum(py[is.na(base$y)]-k))

For instance, if we translate of +5, we would get

> decal(5) [1] 2454.713

while a translation of +10 would return

> decal(10) [1] 2482.29

Clearly, translations do have an impact on the estimation. Here, just to check, if we do not translate, we do have Chain Ladder’s estimate,

> decal(0) [1] 2426.985

The idea mentioned in the previous post was to try several translations, and then extrapolate, to get the value in 0. Here, translations will give the following estimates

> K=10:20 > (V=Vectorize(decal)(K)) [1] 2482.290 2487.788 2493.279 2498.765 2504.245 2509.719 2515.187 2520.649 [9] 2526.106 2531.557 2537.001

We can plot those values, and run a regression

> plot(K,V,xlim=c(0,20),ylim=c(2425,2540)) > abline(h=decal(0),col="red",lty=2)

the dotted horizontal line is Chain Ladder. Now, let us extrapolate

> b=data.frame(K=K,D=V) > rk=lm(D~K,data=b) > predict(rk,newdata=data.frame(K=0)) 1 2427.623

On has to admit that it is not that bad. But yesterday evening, Karim asked me *why* I did use a linear regression, for my extrapolation. And to be honest, I do not know. I mean, the only answer might be that points are almost on a straight line. So the first time I saw it, I was exited, and I ran a linear regression.

Now, let us see if we can do better. Because here, we do use a translation of +10 or +20 (which might be rather small). What if we use much larger values ? (because we might have large negative incremental values). With the following code, we try, each time 11 consecutive values, the smallest one going from 0 to 50,

> hausse=1:50; res=rep(NA,50) > for(k in hausse){ + VK=k:(10+k) + b=data.frame(K=VK,D=Vectorize(decal)(VK)) + rk=lm(D~K,data=b) + res[k]=predict(rk,newdata=data.frame(K=0)) + } > plot(hausse,res,type="l",col="red",ylim=c(2422,2440)) > abline(rk,col="blue")

Here, we compute reserves when extrapolations were done after 11 translations, from to . With different values of . The case where is ten was the one mentioned above,

> res[hausse==10] [1] 2427.623

Actually, it might also be possible to consider not 11 translations, but 26, from to . Here, we get

> hausse=1:50; res=rep(NA,50) > for(k in hausse){ + VK=k:(25+k) + b=data.frame(K=VK,D=Vectorize(decal)(VK)) + rk=lm(D~K,data=b) + res[k]=predict(rk,newdata=data.frame(K=0)) + } > lines(hausse,res,type="l",col="blue",lty=2)

We now have the dotted line

Here, it is getting worst. So let us keep here 11 translations. Perhaps, we can try something different. For instance a Poisson regression, with a log like (i.e. we consider an exponential extrapolation),

> hausse=1:50; res=rep(NA,50) > for(k in hausse){ + VK=k:(10+k) + b=data.frame(K=VK,D=Vectorize(decal)(VK)) + rk=glm(D~K,data=b,family=poisson) + res[k]=predict(rk,newdata=data.frame(K=0),type="response") + } > lines(hausse,res,type="l",col="purple")

The purple line will be a Poisson model, with a log link. Perhaps we can try another link function, like a quadratic one

> hausse=1:50; res=rep(NA,50) > for(k in hausse){ + VK=k:(10+k) + b=data.frame(K=VK,D=Vectorize(decal)(VK)) + rk=glm(D~K,data=b,family=poisson(link= + power(lambda = 2))) + res[k]=predict(rk,newdata=data.frame(K=0),type="response") + } > lines(hausse,res,type="l",col="orange")

That would be the orange line,

Here, we need a link function between identity (the linear model, the blue line) and the quadratic one (the orange one), for instance a power function 3/2,

> hausse=1:50; res=rep(NA,50) > for(k in hausse){ + VK=k:(10+k) + b=data.frame(K=VK,D=Vectorize(decal)(VK)) + rk=glm(D~K,data=b,family=poisson(link= + power(lambda = 1.5))) + res[k]=predict(rk,newdata=data.frame(K=0),type="response") + } > lines(hausse,res,type="l",col="green")

Here, it looks like we can use that model for any kind of translation, from +10 till +50, even +100 ! But I do not have any intuition about the use of this power function…

**leave a comment**for the author, please follow the link and comment on their blog:

**Freakonometrics » R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...