Posterior predictive output with Stan

May 18, 2015

(This article was first published on mages' blog, and kindly contributed to R-bloggers)

I continue my Stan experiments with another insurance example. Here I am particular interested in the posterior predictive distribution from only three data points. Or, to put it differently I have a customer of three years and I’d like to predict the expected claims cost for the next year to set or adjust the premium.

The example is taken from section 16.17 in Loss Models: From Data to Decisions [1]. Some time ago I used the same example to get my head around a Bayesian credibility model.

Suppose the claims likelihood distribution is believed to follow an exponential distribution for a given parameter (Theta). The prior parameter distribution on (Theta) is assumed to be a gamma distribution with parameters (alpha=4, beta=1000):
[begin{aligned}Theta & sim mbox{Gamma}(alpha, beta)\
ell_i & sim mbox{Exp}(Theta) , ; forall i in N
end{aligned}]In this case the predictive distribution is a Pareto II distribution with density (f(x) = frac{alpha beta^alpha}{(x+beta)^{alpha+1}}) and a mean of (frac{beta}{alpha-1}=,)333.33.

I have three independent observations, namely losses of $100, $950 and $450. The posterior predictive expected loss is $416.67 and can be derived analytical, as shown in my previous post. Now let me reproduce the answer with Stan as well.

Implementing the model in Stan is straightforward and I follow the same steps as in my simple example of last week. However, here I am also interested in the posterior predictive distribution, hence I add a generated quantities code block.

The output shows a simulated predictive mean of $416.86, close to the analytical answer. I can also read out that the 75%ile of the posterior predictive distribution is a loss of $542 vs. $414 from the prior predictive. That means every four years I shouldn’t be surprised to observe a loss in excess of $500. Further I can read of that 90% of losses are expected to be less than $950, or in other words the observation in my data may reflect the outcome of an event with a 1 in 10 return period.

Comparing the sampling output from Stan with the analytical output gives me some confidence that I am doing the ‘right thing’.


[1] Klugman, S. A., Panjer, H. H. & Willmot, G. E. (2004), Loss Models: From Data to Decisions, Wiley Series in Probability and Statistics.

Session Info

R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] lattice_0.20-31 actuar_1.1-8 rstan_2.6.0 inline_0.3.14
[5] Rcpp_0.11.6

loaded via a namespace (and not attached):
[1] tools_3.2.0 codetools_0.2-11 grid_3.2.0 stats4_3.2.0

To leave a comment for the author, please follow the link and comment on their blog: mages' blog. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)