The law of small numbers

Posted on January 28, 2013 by arthur charpentier in Uncategorized | 0 Comments

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In insurance, the law of large numbers (named loi des grands nombres initially by Siméon Poisson, see e.g. http://en.wikipedia.org/…) is usually mentioned to legitimate large portfolios, because of pooling and diversification: the larger the pool, the more ‘predictable’ the losses will be (in a given period). Of course, under standard statistical assumption, namely finite expected value, and independence (see http://freakonometrics.blog.free.fr/…. for a discussion, in French). Since in insurance, catastrophes are usually rare – and extremely costly – and actuaries might be interested to model occurrence of that small number of events (see e.g. Aldous’ book on that specific topic, that can be downloaded from http://stat.berkeley.edu/…). The theorem behind is sometimes called the law of small numbers (from the book published by Ladislaus Bortkiewicz, but we’ll get back to that story later on, see also Whitaker (1914) http://biomet.oxfordjournals.org/… or the book recently published by Michael Falk, Jürg Hüsler and Rolf-Dieter Reiss).

The Poisson distribution

The so-called Poisson distribution (see http://en.wikipedia.org/…) was introduced by Siméon Poisson in 1837 (in Recherches sur la Probabilité des Jugements en Matière Criminelle et en Matière Civile, Précédées des Règles Générales du Calcul des Probabilités, see http://gallica.bnf.fr/…). But it had been defined more than a century before, by Abraham De Moivre, in 17111, in De Mensura Sortis seu; de Probabilitate Eventuum in Ludis a Casu Fortuito Pendentibus (see e.g. the review in http://www.jstor.org/…). Let $http://latex.codecogs.com/gif.latex?N$ denote a counting random variable, then it said to be Poisson distributed if there is $http://latex.codecogs.com/gif.latex?\lambda\in(0,\infty)$ such that

$http://latex.codecogs.com/gif.latex?\mathbb{P}(N=k)=e^{-\lambda}\frac{\lambda^k}{k!},\forall%20k\in\mathbb{N}$

De Moivre obtained that distribution from an approximation of the binomial distribution. Recall that the binomial distribution is a standard distribution in actuarial science, for instance to model the number of deaths among $http://latex.codecogs.com/gif.latex?n$ insured. If individual death probabilities are identical, say $http://latex.codecogs.com/gif.latex?p$ , and if deaths are independent events, then

$http://latex.codecogs.com/gif.latex?\mathbb{P}(N=k)=\binom{n}{k}p^k(1-p)^{n-k},\forall%20k\in\{0,1,\cdots,n\}$
And if $http://latex.codecogs.com/gif.latex?n\rightarrow\infty$ and $http://latex.codecogs.com/gif.latex?np\rightarrow%20\lambda$ , then

$http://latex.codecogs.com/gif.latex?\mathbb{P}(N=k)\rightarrow%20e^{-\lambda}\frac{\lambda^k}{k!}$ Again, this is an asymptotic theorem, which is valid when we have a lot of observations ( $http://latex.codecogs.com/gif.latex?n\rightarrow\infty$ ), but also, the probability of occurrence should be extremely small (since $http://latex.codecogs.com/gif.latex?p\sim\lambda/n$ ), which is why to use the term small numbers. Siméon Poisson was not interested by mathematical approximations: his main point was to get a distribution with nice goodness of fit properties for the data he was working on. He wanted to get a better understanding of cours d’assises (jury panel, might be a valid translation of the French term). A jury consists of 12 jurors who voted to determine whether a defendant was guilty. When guilt was predominant, with at least 8 votes against 4, the defendant was convicted (which was 47% of criminal cases). 5 with 7 votes against, the opinion of professional judges was requested (11% of criminal trials again). Using these statistics we can demonstrate that a defendant brought before an assize court is guilty of the order of 68%, and the probability that a juror is not wrong by voting (condemning an innocent or releasing a culprit) was about 54%. He sought to calculate the probability that a defendant is wrongfully convicted, and gets 2%. And 28% of exonerated defendants are in fact guilty. Siméon Poisson introduced this law to get probabilities easily. But the law he considered is central in probability….

The law of small numbers

The heuristic of the main theorem, related to the Poisson distribution is the following: let $http://latex.codecogs.com/gif.latex?X_1,%20\cdots,X_n$ denote i.i.d random variables taking values in $http://latex.codecogs.com/gif.latex?%20\mathbb{R}^d$ (in a general setting). Let $http://latex.codecogs.com/gif.latex?\mathcal{A}_n\subset\mathbb{R}^d$ . If $http://latex.codecogs.com/gif.latex?\mathbb{P}(X_i%20\in%20\mathcal{A}_n)\rightarrow%200$ as $http://latex.codecogs.com/gif.latex?n\rightarrow\infty$ (or $http://latex.codecogs.com/gif.latex?\mathbb{P}(X_i%20\in%20\mathcal{A}_n)=O(n^{-1})$ to be a little bit more specific about the assumptions), let $http://latex.codecogs.com/gif.latex?N$ denote the (random variable characterizing) count of events $http://latex.codecogs.com/gif.latex?\{X_i%20\in%20\mathcal{A}_n\}$ , then $http://latex.codecogs.com/gif.latex?N$ can be approximated by a Poisson distribution with parameter $http://latex.codecogs.com/gif.latex?\lambda%20=n%20\times%20\mathbb%20P(X_i%20\in%20\mathcal{A}_n)$ .
The heuristic is that if we consider a large number of observations, and if we count how many are in a given (small) region, then the number of such observations is Poisson distributed.

n=1000
X=runif(n)*10-1.5
Y=runif(n)*10-1.5
plot(X,Y,axis=FALSE,cex=.6)
u=seq(-1,1,by=.01)
v=sqrt(1-u^2)
polygon(c(u,rev(u)),c(v,rev(-v)),col="yellow",border=NA)
I=(X^2+Y^2)<1
points(X[I],Y[I],cex=.6,pch=19,col="red")

If we run some simulations,

>  n=1000
>  ns=100000
>  N=rep(NA,ns)
> for(s in 1:ns){
+ X=runif(n)*10-1.5
+ Y=runif(n)*10-1.5
+ I=(X^2+Y^2)<1
+ N[s]=sum(I)
+ }
> hist(N,breaks=0:60,probability=TRUE,col="yellow")
> mean(N)
[1] 31.41257

The parameter of the Poisson distribution is the area of the yellow disk, over the area of the square, i.e.

> (lambda=10*pi)
[1] 31.41593
> lines(0:60-.5,dpois(0:60,lambda),type="b",col="red")

To get an interpretation related to insurance modeling, let $http://latex.codecogs.com/gif.latex?\mathcal{A}$ denote an upper layer in a reinsurance contract, i.e. $http://latex.codecogs.com/gif.latex?\mathcal{A}=\{x%3Ed\}$ for some deductible $http://latex.codecogs.com/gif.latex?d$ . Let $http://latex.codecogs.com/gif.latex?X_i$ ‘s denote individual losses. Then the number of claims that hit this upper layer can be modeled with a Poisson distribution. More precisely, if deductible $http://latex.codecogs.com/gif.latex?d$ becomes extremely large (and $http://latex.codecogs.com/gif.latex?\mathbb{P}(X_i%20\in%20\mathcal{A})\rightarrow%200$ ), we obtain the point-over-threshold model in extreme value theory (see e.g. http://brale.math.hr/~iugrina/… or http://fire.nist.gov/bfrlpubs/…): if $http://latex.codecogs.com/gif.latex?N$ has a Poisson distribution and, conditionally on $http://latex.codecogs.com/gif.latex?N$ , $http://latex.codecogs.com/gif.latex?X_1,\cdots,X_N$ are independent identically distributed generalized Pareto random variables, then $http://latex.codecogs.com/gif.latex?\max\{X_1,\cdots,X_N\}$ has the generalized extreme value distribution. Thus, exceedances models (for rare events) are closely related to Poisson processes.

The Poisson process

As mentioned above, the Poisson distribution appears when events occur somehow randomly and independently, over time. It is then natural to study the time between two occurences (or two claims, in an insurance context).

Poisson distribution, and claims occurrence

It is neither Siméon Poisson nor De Moivre, but Ladislaus Von Bortkiewicz who first mentioned the Poisson distribution as the law of small numbers. In 1898 (see http://archive.org/…), he studied the number number of soldiers killed by being kicked by a horse, from 1875 till 1894, in 200 corps (more precisely 10 corps over 20 ans).

He did obtain the following distribution (here, the parameter of the Poisson distribution is 0.61, i.e. the average number of death per year)

number of death per year	Empirical counts	Poisson distribution
0	109	108.67
1	65	66.21
2	22	20.22
3	3	4.11
4	1	0.63
5 and more	0	0.08

It is possible to find a lot of cases where the Poisson distribution fits extremely well. For instance, if we consider the number of hurricanes, that landed in Florida after 1850,

number of hurricanes per year	empirical frequency	Poisson frequency
0	30	27.16
1	48	47.99
2	37	42.41
3	29	24.98
4	8	11.03
5	3	3.90
6	3	1.15
7	1	0.29
8 and more	0	0.08

Poisson distribution, and return period

The return period was introduced by Emil Gumbel, in hydrology, to link probabilities and durations (see e.g. http://freakonometrics.blog.free.fr/…). A decennial event has an occurence probability of 1/10. 10 is then the average waiting time before occurence. This does not mean that the event will not occur before 10 years, or has to occur before 10 years. Consider a return period $http://latex.codecogs.com/gif.latex?T$ (in years), then the yearly probability of non-occurrence is $http://latex.codecogs.com/gif.latex?1-(1/T)$ .

And the probability of non-occurence over $http://latex.codecogs.com/gif.latex?n$ years is then $http://latex.codecogs.com/gif.latex?1-[1-(1/T)]^n$ . It is standard to summarize this property with the following table,

	return period $http://latex.codecogs.com/gif.latex?T$
Number of years ( $http://latex.codecogs.com/gif.latex?n$ ) without catastrophes		10	20	50	100	200
	10	65.1%	40.1%	18.3%	9.6%	4.9%
	20	87.8%	64.2%	33.2%	18.2%	9.5%
	50	99.5%	92.3%	63.6%	39.5%	22.5%
	100	99.9%	99.4%	86.7%	63.4%	39.5%
	200	99.9%	99.9%	98.2%	86.6%	63.3%

The diagonal in the table above is extremely interesting. It looks like there is some kind of convergence towards a limiting value (here 63.2%). Indeed, the number of events observed over n years have a Binomial distribution, with probability $http://latex.codecogs.com/gif.latex?1/T=1/n$ , which will converge towards the Poisson distribution with parameter 1. The probability of not having a catastrophe is then $http://latex.codecogs.com/gif.latex?1-\exp(-1)$ , which is equal to 0.632.

Rare probabilities and the Poisson distribution

The Poisson distribution keeps appearing when computing probabilies of rare events. For instance, the probability to have at least one incident in a nuclear plant in France, over a 50 year period. Assume that the annual probability of an incident in a reactor $http://latex.codecogs.com/gif.latex?p$ is small, e.g. 0.05%. Assume further that reactors are independent among them, and in time. The probability to have no incident over 80 reactors in 50 years is (exactly)

$http://latex.codecogs.com/gif.latex?\mathbb{P}(N=0)=1-(1-p)^{50%20\times%2080}$

Of course, a linear approximation is not correct (even if it was mentioned in some French newspaper, as explained in an old post http://freakonometrics.blog.free.fr/…)

$http://latex.codecogs.com/gif.latex?\mathbb%20P(N=0)\neq%2050\times%2080\times%20p$

On the other hand

$http://latex.codecogs.com/gif.latex?\mathbb%20P(N=0)=1-(1-p)^{50\times80%20}%20\sim1-\exp\left(-50\times80\times%20p%20\right)$

> p=0.00005
> 1-(1-p)^(50*80)
[1] 0.1812733
> 1-exp(-50*80*p)
[1] 0.1812692

which is the probability that $http://latex.codecogs.com/gif.latex?N$ is null when $http://latex.codecogs.com/gif.latex?N$ has a Poisson distribution with parameter $http://latex.codecogs.com/gif.latex?\lambda=50\times80\times%20p$ . We clearly see here an application of De Moivre’s approximation in risk management.

Another way of looking at this problem is based on the following idea: given the fact that in 45 years of observations on 450 reactors worldwide (roughly), three major accidents were observed including Three Mile Island (1979) and Fukushima (2011), i.e. the average time between accidents can be estimated at 16 years. For a single reactor, we can assume that the average time to wait before an incident is 450 times 16 years, i.e 7200 years. Or the probability to have one incident, over one year, for one reactor is 1 over 7200 (this is the idea behind the return period concept). If we assume that the arrival of accidents occurs randomly and independently of each other (as defined above) then the number of major accidents observed over a period of 50 years in France follows a Poisson distribution with parameter 50 / (7200/80). Also, the probability of having no major accident over 50 years, with 80 reactors can be estimated by

$http://latex.codecogs.com/gif.latex?1-\exp(-50\times%2080/7200)$

i.e.

> 1-exp(-50*80/7200)
[1] 0.4262466

(keeping in mind all the uncertainty around the estimated waiting time before a major accident to a single reactor!).

Arthur Charpentier

Arthur Charpentier, professor at UQaM in Actuarial Science. Former professor-assistant at ENSAE Paristech, associate professor at Ecole Polytechnique and assistant professor in Economics at Université de Rennes 1. Graduated from ENSAE, Master in Mathematical Economics (Paris Dauphine), PhD in Mathematics (KU Leuven), and Fellow of the French Institute of Actuaries.