# Posterior predictive output with Stan

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I continue my Stan experiments with another insurance example. Here I am particular interested in the posterior predictive distribution from only three data points. Or, to put it differently I have a customer of three years and I’d like to predict the expected claims cost for the next year to set or adjust the premium.**mages' blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The example is taken from section 16.17 in

*Loss Models: From Data to Decisions*[1]. Some time ago I used the same example to get my head around a Bayesian credibility model.

Suppose the claims likelihood distribution is believed to follow an exponential distribution for a given parameter (Theta). The prior parameter distribution on (Theta) is assumed to be a gamma distribution with parameters (alpha=4, beta=1000):

[begin{aligned}Theta & sim mbox{Gamma}(alpha, beta)\

ell_i & sim mbox{Exp}(Theta) , ; forall i in N

end{aligned}]In this case the predictive distribution is a Pareto II distribution with density (f(x) = frac{alpha beta^alpha}{(x+beta)^{alpha+1}}) and a mean of (frac{beta}{alpha-1}=,)333.33.

I have three independent observations, namely losses of $100, $950 and $450. The posterior predictive expected loss is $416.67 and can be derived analytical, as shown in my previous post. Now let me reproduce the answer with Stan as well.

Implementing the model in Stan is straightforward and I follow the same steps as in my simple example of last week. However, here I am also interested in the posterior predictive distribution, hence I add a

`generated quantities`

code block. The output shows a simulated predictive mean of $416.86, close to the analytical answer. I can also read out that the 75%ile of the posterior predictive distribution is a loss of $542 vs. $414 from the prior predictive. That means every four years I shouldn’t be surprised to observe a loss in excess of $500. Further I can read of that 90% of losses are expected to be less than $950, or in other words the observation in my data may reflect the outcome of an event with a 1 in 10 return period.

Comparing the sampling output from Stan with the analytical output gives me some confidence that I am doing the ‘right thing’.

### References

[1] Klugman, S. A., Panjer, H. H. & Willmot, G. E. (2004), Loss Models: From Data to Decisions, Wiley Series in Probability and Statistics.### Session Info

R version 3.2.0 (2015-04-16) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.10.3 (Yosemite) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lattice_0.20-31 actuar_1.1-8 rstan_2.6.0 inline_0.3.14 [5] Rcpp_0.11.6 loaded via a namespace (and not attached): [1] tools_3.2.0 codetools_0.2-11 grid_3.2.0 stats4_3.2.0

To

**leave a comment**for the author, please follow the link and comment on their blog:**mages' blog**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.