# Poisson regression on non-integers

May 7, 2013
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

In the course on claims reserving techniques, I did mention the use of Poisson regression, even if incremental payments were not integers. For instance, we did consider incremental triangles

>  source("http://perso.univ-rennes1.fr/arthur.charpentier/bases.R")
>  INC=PAID
>  INC[,2:6]=PAID[,2:6]-PAID[,1:5]
>  INC
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3209 1163   39   17    7   21
[2,] 3367 1292   37   24   10   NA
[3,] 3871 1474   53   22   NA   NA
[4,] 4239 1678  103   NA   NA   NA
[5,] 4929 1865   NA   NA   NA   NA
[6,] 5217   NA   NA   NA   NA   NA

On those payments, it is natural to use a Poisson regression, to predict future payments

>  Y=as.vector(INC)
>  D=rep(1:6,each=6)
>  A=rep(2001:2006,6)
>  base=data.frame(Y,D,A)
>  Yp=predict(reg,type="response",newdata=base)
>  matrix(Yp,6,6)
[,1]   [,2] [,3] [,4] [,5] [,6]
[1,] 3155.6 1202.1 49.8 19.1  8.2 21.0
[2,] 3365.6 1282.0 53.1 20.4  8.7 22.3
[3,] 3863.7 1471.8 60.9 23.4 10.0 25.7
[4,] 4310.0 1641.8 68.0 26.1 11.2 28.6
[5,] 4919.8 1874.1 77.6 29.8 12.8 32.7
[6,] 5217.0 1987.3 82.3 31.6 13.5 34.7

and the total amount of reserves would be

>  sum(Yp[is.na(Y)==TRUE])
[1] 2426.985

Here, payments were in ’000 euros. What if they were in ’000’000 euros ?

> a=1000
> INC/a
[,1]  [,2]  [,3]  [,4]  [,5]  [,6]
[1,] 3.209 1.163 0.039 0.017 0.007 0.021
[2,] 3.367 1.292 0.037 0.024 0.010    NA
[3,] 3.871 1.474 0.053 0.022    NA    NA
[4,] 4.239 1.678 0.103    NA    NA    NA
[5,] 4.929 1.865    NA    NA    NA    NA
[6,] 5.217    NA    NA    NA    NA    NA

We can still run a regression here

> reg=glm((Y/a)~as.factor(D)+as.factor(A),data=base,family=poisson(link="log"))
> Yp=predict(reg,type="response",newdata=base)
> sum(Yp[is.na(Y)==TRUE])*a
[1] 2426.985

and the prediction is exactly the same. Actually, it is possible to change currency, and multiply by any kind of constant, the Poisson regression will return always the same prediction, if we use a log link function,

>  homogeneity=function(a=1){
+  Yp=predict(reg,type="response",newdata=base)
+  return(sum(Yp[is.na(Y)==TRUE])*a)
+  }
>  Vectorize(homogeneity)(10^(seq(-3,5)))
[1] 2426.985 2426.985 2426.985 2426.985 2426.985 2426.985 2426.985 2426.985 2426.985

The trick here come from the fact that we do like the Poisson interpretation. But GLMs simply mean that we do want to solve a first order condition. It is possible to solve explicitly the first order condition, which was obtained without any condition such that values should be integers. To run a simple code, the intercept should be related to the last value of the matrix, not the first one.

> base$D=relevel(as.factor(base$D),"6")
> base$A=relevel(as.factor(base$A),"2006")
> summary(reg)

Call:
glm(formula = Y ~ as.factor(D) + as.factor(A), family = poisson(link = "log"),
data = base)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-2.3426  -0.4996   0.0000   0.2770   3.9355

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)       3.54723    0.21921  16.182  < 2e-16 ***
as.factor(D)1     5.01244    0.21877  22.912  < 2e-16 ***
as.factor(D)2     4.04731    0.21896  18.484  < 2e-16 ***
as.factor(D)3     0.86391    0.22827   3.785 0.000154 ***
as.factor(D)4    -0.09254    0.25229  -0.367 0.713754
as.factor(D)5    -0.93717    0.32643  -2.871 0.004092 **
as.factor(A)2001 -0.50271    0.02079 -24.179  < 2e-16 ***
as.factor(A)2002 -0.43831    0.02045 -21.433  < 2e-16 ***
as.factor(A)2003 -0.30029    0.01978 -15.184  < 2e-16 ***
as.factor(A)2004 -0.19096    0.01930  -9.895  < 2e-16 ***
as.factor(A)2005 -0.05864    0.01879  -3.121 0.001799 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 46695.269  on 20  degrees of freedom
Residual deviance:    30.214  on 10  degrees of freedom
(15 observations deleted due to missingness)
AIC: 209.52

The first idea is to run a gradient descent, as follows (the starting point will be coefficients from a linear regression on the log of the observations),

> YNA <- Y
> XNA=matrix(0,length(Y),1+5+5)
> XNA[,1]=rep(1,length(Y))
>   for(k in 1:5) XNA[(k-1)*6+1:6,k+1]=k
>   u=(1:(length(Y))%%6); u[u==0]=6
>   for(k in 1:5) XNA[u==k,k+6]=k
>     YnoNA=YNA[is.na(YNA)==FALSE]
>     XnoNA=XNA[is.na(YNA)==FALSE,]
>     beta=lm(log(YnoNA)~0+XnoNA)$coefficients > for(s in 1:50){ + Ypred=exp(XnoNA%*%beta) + gradient=t(XnoNA)%*%(YnoNA-Ypred) + omega=matrix(0,nrow(XnoNA),nrow(XnoNA));diag(omega)=exp(XnoNA%*%beta) + hessienne=-t(XnoNA)%*%omega%*%XnoNA + beta=beta-solve(hessienne)%*%gradient} > beta [,1] [1,] 3.54723486 [2,] 5.01244294 [3,] 2.02365553 [4,] 0.28796945 [5,] -0.02313601 [6,] -0.18743467 [7,] -0.50271242 [8,] -0.21915742 [9,] -0.10009587 [10,] -0.04774056 [11,] -0.01172840 We are not too far away from the values given by R. Actually, it is just fine if we focus on the predictions > matrix(exp(XNA%*%beta),6,6)) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 3155.6 1202.1 49.8 19.1 8.2 21.0 [2,] 3365.6 1282.0 53.1 20.4 8.7 22.3 [3,] 3863.7 1471.8 60.9 23.4 10.0 25.7 [4,] 4310.0 1641.8 68.0 26.1 11.2 28.6 [5,] 4919.8 1874.1 77.6 29.8 12.8 32.7 [6,] 5217.0 1987.3 82.3 31.6 13.5 34.7 which are exactly the one obtained above. And here, we clearly see that there is no assumption such as “explained variate should be an integer“. It is also possible to remember that the first order condition is the same as the one we had with a weighted least square model. The problem is that the weights are function of the prediction. But using an iterative algorithm, we should converge, > beta=lm(log(YnoNA)~0+XnoNA)$coefficients
>  for(i in 1:50){
+ Ypred=exp(XnoNA%*%beta)
+  z=XnoNA%*%beta+(YnoNA-Ypred)/Ypred
+  REG=lm(z~0+XnoNA,weights=Ypred)
+  beta=REG\$coefficients
+ }
>
> beta
XnoNA1      XnoNA2      XnoNA3      XnoNA4      XnoNA5      XnoNA6
3.54723486  5.01244294  2.02365553  0.28796945 -0.02313601 -0.18743467
     XnoNA7      XnoNA8      XnoNA9     XnoNA10     XnoNA11
-0.50271242 -0.21915742 -0.10009587 -0.04774056 -0.01172840

which are the same values as the one we got previously. Here again, the prediction is the same as the one we got from this so-called Poisson regression,

> matrix(exp(XNA%*%beta),6,6)
[,1]   [,2] [,3] [,4] [,5] [,6]
[1,] 3155.6 1202.1 49.8 19.1  8.2 20.9
[2,] 3365.6 1282.0 53.1 20.4  8.7 22.3
[3,] 3863.7 1471.8 60.9 23.4 10.0 25.7
[4,] 4310.0 1641.8 68.0 26.1 11.2 28.6
[5,] 4919.8 1874.1 77.6 29.8 12.8 32.7
[6,] 5217.0 1987.3 82.3 31.6 13.5 34.7

Again, it works just fine because GLMs are mainly conditions on the first two moments, and numerical computations are based on the first order condition, which has less constraints than the interpretation in terms of a Poisson model.

### Arthur Charpentier

Arthur Charpentier, professor in Montréal, in Actuarial Science. Former professor-assistant at ENSAE Paristech, associate professor at Ecole Polytechnique and assistant professor in Economics at Université de Rennes 1.  Graduated from ENSAE, Master in Mathematical Economics (Paris Dauphine), PhD in Mathematics (KU Leuven), and Fellow of the French Institute of Actuaries.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...