The law of small numbers

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In insurance, the law of large numbers (named loi des grands nombres initially by Siméon Poisson, see e.g. http://en.wikipedia.org/…) is usually mentioned to legitimate large portfolios, because of pooling and diversification: the larger the pool, the more ‘predictable’ the losses will be (in a given period). Of course, under standard statistical assumption, namely finite expected value, and independence (see http://freakonometrics.blog.free.fr/…. for a discussion, in French). Since in insurance, catastrophes are usually rare – and extremely costly – and actuaries might be interested to model occurrence of that small number of events (see e.g. Aldous’ book on that specific topic, that can be downloaded from http://stat.berkeley.edu/…). The theorem behind is sometimes called the law of small numbers (from the book published by Ladislaus Bortkiewicz, but we’ll get back to that story later on, see also Whitaker (1914) http://biomet.oxfordjournals.org/… or the book recently published by Michael Falk, Jürg Hüsler and Rolf-Dieter Reiss).

  • The Poisson distribution

The so-called Poisson distribution (see http://en.wikipedia.org/…) was introduced by Siméon Poisson in 1837 (in Recherches sur la Probabilité des Jugements en Matière Criminelle et en Matière Civile, Précédées des Règles Générales du Calcul des Probabilités, see http://gallica.bnf.fr/…). But it had been defined more than a century before, by Abraham De Moivre, in 17111, in De Mensura Sortis seu; de Probabilitate Eventuum in Ludis a Casu Fortuito Pendentibus (see e.g. the review in http://www.jstor.org/…). Let http://latex.codecogs.com/gif.latex?N denote a counting random variable, then it said to be Poisson distributed if there is http://latex.codecogs.com/gif.latex?\lambda\in(0,\infty) such that

http://latex.codecogs.com/gif.latex?\mathbb{P}(N=k)=e^{-\lambda}\frac{\lambda^k}{k!},\forall%20k\in\mathbb{N}

De Moivre obtained that distribution from an approximation of the binomial distribution. Recall that the binomial distribution is a standard distribution in actuarial science, for instance to model the number of deaths among http://latex.codecogs.com/gif.latex?n insured. If individual death probabilities are identical, say http://latex.codecogs.com/gif.latex?p, and if deaths are independent events, then

http://latex.codecogs.com/gif.latex?\mathbb{P}(N=k)=\binom{n}{k}p^k(1-p)^{n-k},\forall%20k\in\{0,1,\cdots,n\}
And if http://latex.codecogs.com/gif.latex?n\rightarrow\infty and  http://latex.codecogs.com/gif.latex?np\rightarrow%20\lambda, then

http://latex.codecogs.com/gif.latex?\mathbb{P}(N=k)\rightarrow%20e^{-\lambda}\frac{\lambda^k}{k!}Again, this is an asymptotic theorem, which is valid when we have a lot of observations (http://latex.codecogs.com/gif.latex?n\rightarrow\infty), but also, the probability of occurrence should be extremely small (since http://latex.codecogs.com/gif.latex?p\sim\lambda/n), which is why to use the term small numbers. Siméon Poisson was not interested by mathematical approximations: his main point was to get a distribution with nice goodness of fit properties for the data he was working on. He wanted to get a better understanding of cours d’assises (jury panel, might be a valid translation of the French term). A jury consists of 12 jurors who voted to determine whether a defendant was guilty. When guilt was predominant, with at least 8 votes against 4, the defendant was convicted (which was 47% of criminal cases). 5 with 7 votes against, the opinion of professional judges was requested (11% of criminal trials again). Using these statistics we can demonstrate that a defendant brought before an assize court is guilty of the order of 68%, and the probability that a juror is not wrong by voting (condemning an innocent or releasing a culprit) was about 54%. He sought to calculate the probability that a defendant is wrongfully convicted, and gets 2%. And 28% of exonerated defendants are in fact guilty. Siméon Poisson introduced this law to get probabilities easily. But the law he considered is central in probability….

  • The law of small numbers

The heuristic of the main theorem, related to the Poisson distribution is the following: let http://latex.codecogs.com/gif.latex?X_1,%20\cdots,X_n denote i.i.d random variables taking values in http://latex.codecogs.com/gif.latex?%20\mathbb{R}^d (in a general setting). Let http://latex.codecogs.com/gif.latex?\mathcal{A}_n\subset\mathbb{R}^d. If  http://latex.codecogs.com/gif.latex?\mathbb{P}(X_i%20\in%20\mathcal{A}_n)\rightarrow%200 as http://latex.codecogs.com/gif.latex?n\rightarrow\infty (or http://latex.codecogs.com/gif.latex?\mathbb{P}(X_i%20\in%20\mathcal{A}_n)=O(n^{-1}) to be a little bit more specific about the assumptions), let http://latex.codecogs.com/gif.latex?N denote the (random variable characterizing) count of events http://latex.codecogs.com/gif.latex?\{X_i%20\in%20\mathcal{A}_n\}, then http://latex.codecogs.com/gif.latex?N can be approximated by a Poisson distribution with parameter http://latex.codecogs.com/gif.latex?\lambda%20=n%20\times%20\mathbb%20P(X_i%20\in%20\mathcal{A}_n).
The heuristic is that if we consider a large number of observations, and if we count how many are in a given (small) region, then the number of such observations is Poisson distributed.

n=1000
X=runif(n)*10-1.5
Y=runif(n)*10-1.5
plot(X,Y,axis=FALSE,cex=.6)
u=seq(-1,1,by=.01)
v=sqrt(1-u^2)
polygon(c(u,rev(u)),c(v,rev(-v)),col="yellow",border=NA)
I=(X^2+Y^2)<1
points(X[I],Y[I],cex=.6,pch=19,col="red")

If we run some simulations,

>  n=1000
>  ns=100000
>  N=rep(NA,ns)
> for(s in 1:ns){
+ X=runif(n)*10-1.5
+ Y=runif(n)*10-1.5
+ I=(X^2+Y^2)<1
+ N[s]=sum(I)
+ }
> hist(N,breaks=0:60,probability=TRUE,col="yellow")
> mean(N)
[1] 31.41257

The parameter of the Poisson distribution is the area of the yellow disk, over the area of the square, i.e.

> (lambda=10*pi)
[1] 31.41593
> lines(0:60-.5,dpois(0:60,lambda),type="b",col="red")

https://i2.wp.com/f.hypotheses.org/wp-content/blogs.dir/253/files/2013/01/Capture-d%E2%80%99e%CC%81cran-2013-01-28-a%CC%80-11.14.21.png?resize=403%2C198

To get an interpretation related to insurance modeling, let http://latex.codecogs.com/gif.latex?\mathcal{A} denote an upper layer in a reinsurance contract, i.e. http://latex.codecogs.com/gif.latex?\mathcal{A}=\{x%3Ed\} for some deductible http://latex.codecogs.com/gif.latex?d. Let http://latex.codecogs.com/gif.latex?X_i‘s denote individual losses. Then the number of claims that hit this upper layer can be modeled with a Poisson distribution. More precisely, if deductible http://latex.codecogs.com/gif.latex?d becomes extremely large (and http://latex.codecogs.com/gif.latex?\mathbb{P}(X_i%20\in%20\mathcal{A})\rightarrow%200), we obtain the point-over-threshold model in extreme value theory (see e.g. http://brale.math.hr/~iugrina/… or  http://fire.nist.gov/bfrlpubs/…): if http://latex.codecogs.com/gif.latex?N has a Poisson distribution and, conditionally on http://latex.codecogs.com/gif.latex?Nhttp://latex.codecogs.com/gif.latex?X_1,\cdots,X_N are independent identically distributed generalized Pareto random variables, then http://latex.codecogs.com/gif.latex?\max\{X_1,\cdots,X_N\} has the generalized extreme value distribution. Thus, exceedances models (for rare events) are closely related to Poisson processes.

  • The Poisson process

As mentioned above, the Poisson distribution appears when events occur somehow randomly and independently, over time. It is then natural to study the time between two occurences (or two claims, in an insurance context).

  • Poisson distribution, and claims occurrence

It is neither Siméon Poisson nor De Moivre, but Ladislaus Von Bortkiewicz who first mentioned the Poisson distribution as the law of small numbers. In 1898 (see http://archive.org/…), he studied the number number of soldiers killed by being kicked by a horse, from 1875 till 1894, in 200 corps (more precisely 10 corps over 20 ans).

He did obtain the following distribution (here, the parameter of the Poisson distribution is 0.61, i.e. the average number of death per year)

number of
death per
year
Empirical
counts
Poisson
distribution
0 109 108.67
1 65 66.21
2 22 20.22
3 3 4.11
4 1 0.63
5 and more 0 0.08

It is possible to find a lot of cases where the Poisson distribution fits extremely well. For instance, if we consider the number of hurricanes, that landed in Florida after 1850,

number of
hurricanes
per year
empirical
frequency
Poisson
frequency
0 30 27.16
1 48 47.99
2 37 42.41
3 29 24.98
4 8 11.03
5 3 3.90
6 3 1.15
7 1 0.29
8 and more 0 0.08
  • Poisson distribution, and return period

The return period was introduced by Emil Gumbel, in hydrology, to link probabilities and durations (see e.g. http://freakonometrics.blog.free.fr/…). A decennial event has an occurence probability of 1/10. 10 is then the average waiting time before occurence. This does not mean that the event will not occur before 10 years, or has to occur before 10 years. Consider a return period http://latex.codecogs.com/gif.latex?T (in years), then the yearly probability of non-occurrence is http://latex.codecogs.com/gif.latex?1-(1/T).

And the probability of non-occurence over http://latex.codecogs.com/gif.latex?n years is then http://latex.codecogs.com/gif.latex?1-[1-(1/T)]^n. It is standard to summarize this property with the following table,

return period http://latex.codecogs.com/gif.latex?T

Number of years (http://latex.codecogs.com/gif.latex?n) without catastrophes

10 20 50 100 200
10 65.1% 40.1% 18.3% 9.6% 4.9%
20 87.8% 64.2% 33.2% 18.2% 9.5%
50 99.5% 92.3% 63.6% 39.5% 22.5%
100 99.9% 99.4% 86.7% 63.4% 39.5%
200 99.9% 99.9% 98.2% 86.6% 63.3%

The diagonal in the table above is extremely interesting. It looks like there is some kind of convergence towards a limiting value (here  63.2%). Indeed, the number of events observed over n years have a Binomial distribution, with probability http://latex.codecogs.com/gif.latex?1/T=1/n, which will converge towards the Poisson distribution with parameter 1. The probability of not having a catastrophe is then http://latex.codecogs.com/gif.latex?1-\exp(-1), which is equal to 0.632.

  • Rare probabilities and the Poisson distribution

The Poisson distribution keeps appearing when computing probabilies of rare events. For instance, the probability to have at least one incident in a nuclear plant in France, over a 50 year period. Assume that the annual probability of an incident in a reactor http://latex.codecogs.com/gif.latex?p is small, e.g. 0.05%. Assume further that reactors are independent among them, and in time. The probability to have no incident over 80 reactors in 50 years is (exactly)

http://latex.codecogs.com/gif.latex?\mathbb{P}(N=0)=1-(1-p)^{50%20\times%2080}

Of course, a linear approximation is not correct (even if it was mentioned in some French newspaper, as explained in an old post http://freakonometrics.blog.free.fr/…)

http://latex.codecogs.com/gif.latex?\mathbb%20P(N=0)\neq%2050\times%2080\times%20p

On the other hand

http://latex.codecogs.com/gif.latex?\mathbb%20P(N=0)=1-(1-p)^{50\times80%20}%20\sim1-\exp\left(-50\times80\times%20p%20\right)

> p=0.00005
> 1-(1-p)^(50*80)
[1] 0.1812733
> 1-exp(-50*80*p)
[1] 0.1812692

which is the probability that http://latex.codecogs.com/gif.latex?N is null when http://latex.codecogs.com/gif.latex?N has a Poisson distribution with parameter http://latex.codecogs.com/gif.latex?\lambda=50\times80\times%20p. We clearly see here an application of De Moivre’s approximation in risk management.

Another way of looking at this problem is based on the following idea: given the fact that in 45 years of observations on 450 reactors worldwide (roughly), three major accidents were observed including Three Mile Island (1979) and Fukushima (2011), i.e. the average time between accidents can be estimated at 16 years. For a single reactor, we can assume that the average time to wait before an incident is 450 times 16 years, i.e 7200 years. Or the probability to have one incident, over one year, for one reactor is 1 over 7200 (this is the idea behind the return period concept). If we assume that the arrival of accidents occurs randomly and independently of each other (as defined above) then the number of major accidents observed over a period of 50 years in France follows a Poisson distribution with parameter 50 / (7200/80). Also, the probability of having no major accident over 50 years, with 80 reactors can be estimated by

http://latex.codecogs.com/gif.latex?1-\exp(-50\times%2080/7200)

i.e.

> 1-exp(-50*80/7200)
[1] 0.4262466

(keeping in mind all the uncertainty around the estimated waiting time before a major accident to a single reactor!).

Arthur Charpentier

Arthur Charpentier, professor at UQaM in Actuarial Science. Former professor-assistant at ENSAE Paristech, associate professor at Ecole Polytechnique and assistant professor in Economics at Université de Rennes 1. Graduated from ENSAE, Master in Mathematical Economics (Paris Dauphine), PhD in Mathematics (KU Leuven), and Fellow of the French Institute of Actuaries.

More Posts - Website

Follow Me:
TwitterLinkedInGoogle Plus

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)