# Reserving with negative increments in triangles

April 11, 2013
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

A few months ago, I did published a post on negative values in triangles, and how to deal with them, when using a Poisson regression (the post was published in French). The idea was to use a translation technique:

1. Fit a model not on $Y_i$‘s but on $Y_i^{(k)}=Y_i+k$, for some $k\geq 0$,
2. Use that model to make predictions, and then translate those predictions, $\widehat{Y}_i^{(k)}-k$

This is what was done to get the following graph, where a Poisson regression was fitted. Black points are $Y_i$‘s while blue points are $\widehat{Y}_i^{(k)}$‘s, for some $k\geq 0$. We fit a model to get the blue prediction, and then translate it to get the red prediction (on the $Y_i$‘s).

In this example, there were no negative values, but it is possible to use it get a better understanding on the impact of this technique. The prediction, here, is the red line. And clearly, the value of $k$ has an impact on the prediction (since we do not consider, here, a linear model: with a linear model, translating has not impact at all, except on the intercept).

The alternative mentioned in the previous post was to use this technique on several $k$‘s, and them interpolate

1. For a given $k$, fit a model not on $Y_i$‘s but on $Y_i^{(k)}=Y_i+k$, use that model to make predictions, and then translate those predictions, $\widehat{Y}_i^{(k)}-k$.
2. Do it for several $k$‘s.
3. Use it to extrapolate when $k$ is $0$ (which is the case we are interested in).

In the context of loss reserving, the idea is extremely simple. Consider a triangle with incremental payments

> source("http://perso.univ-rennes1.fr/arthur.charpentier/bases.R")
> Y=T=PAID
> n=ncol(T)
> Y[,2:n]=T[,2:n]-T[,1:(n-1)]
> Y
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3209 1163   39   17    7   21
[2,] 3367 1292   37   24   10   NA
[3,] 3871 1474   53   22   NA   NA
[4,] 4239 1678  103   NA   NA   NA
[5,] 4929 1865   NA   NA   NA   NA
[6,] 5217   NA   NA   NA   NA   NA

Now, we do not have negative values, here, but we can still see is translation techniques can be used. The benchmark is the Poisson regression, since we can run it :

> y=as.vector(as.matrix(Y))
> base=data.frame(y,ai=rep(2000:2005,n),bj=rep(0:(n-1),each=n))
> reg=glm(y~as.factor(ai)+as.factor(bj),data=base,family=poisson)

Here, the amount is reserve is the sum of predicted values in the lower part of the triangle,

> py=predict(reg,newdata=base,type="response")
> sum(py[is.na(base$y)]) [1] 2426.985 which is exactly Chain Ladder’s estimate. Now, let us use a translation technique to compute the amount of reserves. The code will be > decal=function(k){ + reg=glm(y+k~as.factor(ai)+as.factor(bj),data=base,family=poisson) + py=predict(reg,newdata=base,type="response") + return(sum(py[is.na(base$y)]-k))

For instance, if we translate of +5, we would get

> decal(5)
[1] 2454.713

while a translation of +10 would return

> decal(10)
[1] 2482.29

Clearly, translations do have an impact on the estimation. Here, just to check, if we do not translate, we do have Chain Ladder’s estimate,

> decal(0)
[1] 2426.985

The idea mentioned in the previous post was to try several translations, and then extrapolate, to get the value in 0. Here, translations will give the following estimates

> K=10:20
> (V=Vectorize(decal)(K))
[1] 2482.290 2487.788 2493.279 2498.765 2504.245 2509.719 2515.187 2520.649
[9] 2526.106 2531.557 2537.001

We can plot those values, and run a regression

> plot(K,V,xlim=c(0,20),ylim=c(2425,2540))
> abline(h=decal(0),col="red",lty=2)

the dotted horizontal line is Chain Ladder. Now, let us extrapolate

> b=data.frame(K=K,D=V)
> rk=lm(D~K,data=b)
> predict(rk,newdata=data.frame(K=0))
1
2427.623

On has to admit that it is not that bad. But yesterday evening, Karim asked me why I did use a linear regression, for my extrapolation. And to be honest, I do not know. I mean, the only answer might be that points are almost on a straight line. So the first time I saw it, I was exited, and I ran a linear regression.

Now, let us see if we can do better. Because here, we do use a translation of +10 or +20 (which might be rather small). What if we use much larger values ? (because we might have large negative incremental values). With the following code, we try, each time 11 consecutive values, the smallest one going from 0 to 50,

> hausse=1:50; res=rep(NA,50)
> for(k in hausse){
+ VK=k:(10+k)
+ b=data.frame(K=VK,D=Vectorize(decal)(VK))
+ rk=lm(D~K,data=b)
+ res[k]=predict(rk,newdata=data.frame(K=0))
+ }
> plot(hausse,res,type="l",col="red",ylim=c(2422,2440))
> abline(rk,col="blue")

Here, we compute reserves when extrapolations were done after 11 translations, from $k$ to $k+10$.  With different values of $k$. The case where $k$ is ten was the one mentioned above,

> res[hausse==10]
[1] 2427.623

Actually, it might also be possible to consider not 11 translations, but 26, from $k$ to $k+25$. Here, we get

> hausse=1:50; res=rep(NA,50)
> for(k in hausse){
+ VK=k:(25+k)
+ b=data.frame(K=VK,D=Vectorize(decal)(VK))
+ rk=lm(D~K,data=b)
+ res[k]=predict(rk,newdata=data.frame(K=0))
+ }
> lines(hausse,res,type="l",col="blue",lty=2)

We now have the dotted line

Here, it is getting worst. So let us keep here 11 translations. Perhaps, we can try something different. For instance a Poisson regression, with a log like (i.e. we consider an exponential extrapolation),

> hausse=1:50; res=rep(NA,50)
> for(k in hausse){
+ VK=k:(10+k)
+ b=data.frame(K=VK,D=Vectorize(decal)(VK))
+ rk=glm(D~K,data=b,family=poisson)
+ res[k]=predict(rk,newdata=data.frame(K=0),type="response")
+ }
> lines(hausse,res,type="l",col="purple")

The purple line will be a Poisson model, with a log link. Perhaps we can try another link function, like a quadratic one

> hausse=1:50; res=rep(NA,50)
> for(k in hausse){
+ VK=k:(10+k)
+ b=data.frame(K=VK,D=Vectorize(decal)(VK))
+ power(lambda = 2)))
+ res[k]=predict(rk,newdata=data.frame(K=0),type="response")
+ }
> lines(hausse,res,type="l",col="orange")

That would be the orange line,

Here, we need a link function between identity (the linear model, the blue line) and the quadratic one (the orange one), for instance a power function 3/2,

> hausse=1:50; res=rep(NA,50)
> for(k in hausse){
+ VK=k:(10+k)
+ b=data.frame(K=VK,D=Vectorize(decal)(VK))
+ power(lambda = 1.5)))
+ res[k]=predict(rk,newdata=data.frame(K=0),type="response")
+ }
> lines(hausse,res,type="l",col="green")

Here, it looks like we can use that model for any kind of translation, from +10 till +50, even +100 ! But I do not have any intuition about the use of this power function…

### Arthur Charpentier

Arthur Charpentier, professor in Montréal, in Actuarial Science. Former professor-assistant at ENSAE Paristech, associate professor at Ecole Polytechnique and assistant professor in Economics at Université de Rennes 1.  Graduated from ENSAE, Master in Mathematical Economics (Paris Dauphine), PhD in Mathematics (KU Leuven), and Fellow of the French Institute of Actuaries.