Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week I posted a biological example of fitting a non-linear growth curve with Stan/RStan. Today, I want to apply a similar approach to insurance data using ideas by David Clark [1] and James Guszcza [2].

Instead of predicting the growth of dugongs (sea cows), I would like to predict the growth of cumulative insurance loss payments over time, originated from different origin years. Loss payments of younger accident years are just like a new generation of dugongs, they will be small in size initially, grow as they get older, until the losses are fully settled.

Here is my example data set:

Following the articles cited above I will assume that the growth can be explained by a Weibull curve, with two parameters (theta) (scale) and (omega) (shape):
[
G(dev|omega, theta) = 1 – expleft(-left(frac{dev}{theta}right)^omegaright)
]Inspired by the classical over-dispersed Poisson GLM in actuarial science, Guszcza [2] assumes a power variance structure for the process error as well. For the purpose of this post I will assume that claims cost follow a Normal distribution, to make a comparison with a the classical least square regression easier. With a prior estimate of the ultimate claims cost the cumulative loss can be modelled as:
[
begin{align}
CL_{AY, dev} & sim mathcal{N}(mu_{dev}, sigma^2_{dev}) \
mu_{dev} & = Ult cdot G(dev|omega, theta)\
sigma_{dev} & = sigma sqrt{mu_{dev}}
end{align}
]Perhaps, the above formula suggests a hierarchical modelling approach, with different ultimate loss costs by accident year. Indeed, this is the focus of [2] and I will endeavour to reproduce the results with Stan in my next post, but today I will stick to a pooled model that assumes a constant ultimate loss across all accident years, i.e. (Ult_{AY} = Ult ;forall; AY).

To prepare my analysis I read in the data as a long data frame instead of the matrix structure above. Additionally, I compose another data frame that I will use later to predict payments two years beyond the first 10 years. Furthermore, to align the output with [2] I relabelled the development periods from years to months, so that year 1 becomes month 6, year 2 becomes month 18, etc. The accident years run from 1991 to 2000, while the variable origin maps those years from 1 to 10.

To get a reference point for Stan I start with a non-linear least square regression:

The output above doesn’t look unreasonable, apart from the accident years 1991, 1992 and 1998. The output of gnls gives me an opportunity to provide my prior distributions with good starting points. I will use an inverse Gamma as a prior for (sigma), constrained Normals for the parameters (theta, omega) and (Ult) as well. If you have better ideas, then please get in touch.

The Stan code below is very similar to last week. Again, I am interested here in the posterior distributions, hence I add a block to generate quantities from those. Note, Stan comes with a build-in function for the cumulative Weibull distribution weibull_cdf.

I store the Stan code in a separate text file, which I read into R to compile the model. The compilation takes a little time. The sampling itself is done in a few seconds.

Let’s take a look at the output:

The parameters are a little different to the output of gnls, but well within the standard error. From the plots I notice that the samples for the ultimate loss as well as for (theta) are a little skewed to the right. Well, assuming cumulative losses to follow a Normal distribution was a bit a of stretch to start with. Still, the samples seem to converge.

Finally, I can plot the 90% credible intervals of the posterior distributions.

The 90% prediction credible interval captures most of the data and although this model might not be suitable for reserving individual accident years, it could provide an initial starting point for further investigations. Additionally, thanks to the Bayesian model I have access to the full distribution, not just point estimators and standard errors.

My next post will continue with this data set and the ideas of James Guszcza by allowing the ultimate loss cost to vary by accident year, treating it as a random effect. Here is a teaser of what the output will look like:

### References

[1] David R. Clark. LDF Curve-Fitting and Stochastic Reserving: A Maximum Likelihood Approach. Casualty Actuarial Society, 2003. CAS Fall Forum.

[2] Guszcza, J. Hierarchical Growth Curve Models for Loss Reserving, 2008, CAS Fall Forum,
pp. 146–173.

### Session Info

R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
[7] base

other attached packages:
[1] rstan_2.8.0    ggplot2_1.0.1     Rcpp_0.12.1

loaded via a namespace (and not attached):
[1] nloptr_1.0.4       plyr_1.8.3         tools_3.2.2
[4] digest_0.6.8       lme4_1.1-10        statmod_1.4.21
[7] gtable_0.1.2       mgcv_1.8-8         Matrix_1.2-2
[10] parallel_3.2.2     biglm_0.9-1        SparseM_1.7
[13] proto_0.3-10       gridExtra_2.0.0    coda_0.18-1
[16] stringr_1.0.0      MatrixModels_0.4-1 stats4_3.2.2
[19] lmtest_0.9-34      grid_3.2.2         nnet_7.3-11
[22] inline_0.3.14      tweedie_2.2.1      cplm_0.7-4
[25] minqa_1.2.4        reshape2_1.4.1     car_2.1-0
[28] actuar_1.1-10      magrittr_1.5       codetools_0.2-14
[31] scales_0.3.0       MASS_7.3-44        splines_3.2.2
[34] systemfit_1.1-18   rsconnect_0.3.79   pbkrtest_0.4-2
[37] colorspace_1.2-6   labeling_0.3       quantreg_5.19
[40] sandwich_2.3-4     stringi_1.0-1      munsell_0.4.2
[43] zoo_1.7-12