Predicting events, when they haven’t happened yet

[This article was first published on mages' blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Suppose you have to predict the probabilities of events which haven’t happened yet. How do you do this?

Here is an example from the 1950s when Longley-Cook, an actuary at an insurance company, was asked to price the risk for a mid-air collision of two planes, an event which as far as he knew hadn’t happened before. The civilian airline industry was still very young, but rapidly growing and all Longely-Cook knew was that there were no collisions in the previous 5 years [1].

Where do you start?

Although the probability for a mid-air collision should be very low for any given plane, the probability for an event in a year will be higher.

Let’s think of the years as a series of Bernoulli trials with unknown probability (p). That’s a likelihood. If I start with an uninformed prior, such as a Beta((alpha,beta)) with (alpha=1, beta=1) then I can use the concept of Bayesian conjugates to update my prior believe.

In this case the posterior parameter distribution is Beta again with hyper-parameters (alpha’=alpha + sum_{i=1}^n x_i,, beta’=beta + n – sum_{i=1}^n x_i), where (x_i=1) if the event occurred, or 0 otherwise and (n) is the number of years.

Thus, the updated parameters are (alpha’=1, beta’=6), with a posterior predictive mean of (alpha’/(alpha’+beta’)=1/7). That is a 14.3% chance for a mid-air collision in the next year with a 95% confidence interval of [0, 39%]. Or, in other words, if I round 39% to 40%, a return period of 2.5 years (1/0.4), i.e. up to 4 incidents in 10 years should be allowed for. That’s what Longley-Cook predicted.

Tragically, 128 people died over the Grand Canyon in 1956, and 4 years after that, 134 people died over New York City.

Wikipedia lists 51 notable civilian mid-air collisions since 1922, including helicopters and space crafts. Since 1955 there were 11 incidents that had more than 100 fatalities, the last one in 2006.

So, what would this mean to Mr. Longley-Cook today? Well, first of all that his prediction wasn’t too bad at all. Perhaps, he would set the probability at (1+11)/((1+11)+(1+60-11)(approx)20% today. He may have argued that 2 x 20% = 40% of the average plane value should be included in the world wide premium for airline hull to cover mid-air collisions.

R code


Interested in the application of R in insurance? Join us at the 3rd R in Insurance conference in Amsterdam, 29 June 2015.

References

[1] Computational Actuarial Science with R, Edited by Arthur Charpentier, Chapman and Hall/CRC Reference – 656 Pages

Session Info

R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] XML_3.98-1.1

loaded via a namespace (and not attached):
[1] tools_3.2.0

To leave a comment for the author, please follow the link and comment on their blog: mages' blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)