The law of small numbers

January 28, 2013

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

In insurance, the law of large numbers (named loi des grands nombres initially by Siméon Poisson, see e.g.…) is usually mentioned to legitimate large portfolios, because of pooling and diversification: the larger the pool, the more ‘predictable’ the losses will be (in a given period). Of course, under standard statistical assumption, namely finite expected value, and independence (see…. for a discussion, in French). Since in insurance, catastrophes are usually rare – and extremely costly – and actuaries might be interested to model occurrence of that small number of events (see e.g. Aldous’ book on that specific topic, that can be downloaded from…). The theorem behind is sometimes called the law of small numbers (from the book published by Ladislaus Bortkiewicz, but we’ll get back to that story later on, see also Whitaker (1914)… or the book recently published by Michael Falk, Jürg Hüsler and Rolf-Dieter Reiss).

  • The Poisson distribution

The so-called Poisson distribution (see…) was introduced by Siméon Poisson in 1837 (in Recherches sur la Probabilité des Jugements en Matière Criminelle et en Matière Civile, Précédées des Règles Générales du Calcul des Probabilités, see…). But it had been defined more than a century before, by Abraham De Moivre, in 17111, in De Mensura Sortis seu; de Probabilitate Eventuum in Ludis a Casu Fortuito Pendentibus (see e.g. the review in…). Let denote a counting random variable, then it said to be Poisson distributed if there is\lambda\in(0,\infty) such that\mathbb{P}(N=k)=e^{-\lambda}\frac{\lambda^k}{k!},\forall%20k\in\mathbb{N}

De Moivre obtained that distribution from an approximation of the binomial distribution. Recall that the binomial distribution is a standard distribution in actuarial science, for instance to model the number of deaths among insured. If individual death probabilities are identical, say, and if deaths are independent events, then\mathbb{P}(N=k)=\binom{n}{k}p^k(1-p)^{n-k},\forall%20k\in\{0,1,\cdots,n\}
And if\rightarrow\infty and\rightarrow%20\lambda, then\mathbb{P}(N=k)\rightarrow%20e^{-\lambda}\frac{\lambda^k}{k!}Again, this is an asymptotic theorem, which is valid when we have a lot of observations (\rightarrow\infty), but also, the probability of occurrence should be extremely small (since\sim\lambda/n), which is why to use the term small numbers. Siméon Poisson was not interested by mathematical approximations: his main point was to get a distribution with nice goodness of fit properties for the data he was working on. He wanted to get a better understanding of cours d’assises (jury panel, might be a valid translation of the French term). A jury consists of 12 jurors who voted to determine whether a defendant was guilty. When guilt was predominant, with at least 8 votes against 4, the defendant was convicted (which was 47% of criminal cases). 5 with 7 votes against, the opinion of professional judges was requested (11% of criminal trials again). Using these statistics we can demonstrate that a defendant brought before an assize court is guilty of the order of 68%, and the probability that a juror is not wrong by voting (condemning an innocent or releasing a culprit) was about 54%. He sought to calculate the probability that a defendant is wrongfully convicted, and gets 2%. And 28% of exonerated defendants are in fact guilty. Siméon Poisson introduced this law to get probabilities easily. But the law he considered is central in probability….

  • The law of small numbers

The heuristic of the main theorem, related to the Poisson distribution is the following: let,%20\cdots,X_n denote i.i.d random variables taking values in\mathbb{R}^d (in a general setting). Let\mathcal{A}_n\subset\mathbb{R}^d. If\mathbb{P}(X_i%20\in%20\mathcal{A}_n)\rightarrow%200 as\rightarrow\infty (or\mathbb{P}(X_i%20\in%20\mathcal{A}_n)=O(n^{-1}) to be a little bit more specific about the assumptions), let denote the (random variable characterizing) count of events\{X_i%20\in%20\mathcal{A}_n\}, then can be approximated by a Poisson distribution with parameter\lambda%20=n%20\times%20\mathbb%20P(X_i%20\in%20\mathcal{A}_n).
The heuristic is that if we consider a large number of observations, and if we count how many are in a given (small) region, then the number of such observations is Poisson distributed.


If we run some simulations,

>  n=1000
>  ns=100000
>  N=rep(NA,ns)
> for(s in 1:ns){
+ X=runif(n)*10-1.5
+ Y=runif(n)*10-1.5
+ I=(X^2+Y^2)<1
+ N[s]=sum(I)
+ }
> hist(N,breaks=0:60,probability=TRUE,col="yellow")
> mean(N)
[1] 31.41257

The parameter of the Poisson distribution is the area of the yellow disk, over the area of the square, i.e.

> (lambda=10*pi)
[1] 31.41593
> lines(0:60-.5,dpois(0:60,lambda),type="b",col="red")

To get an interpretation related to insurance modeling, let\mathcal{A} denote an upper layer in a reinsurance contract, i.e.\mathcal{A}=\{x%3Ed\} for some deductible Let‘s denote individual losses. Then the number of claims that hit this upper layer can be modeled with a Poisson distribution. More precisely, if deductible becomes extremely large (and\mathbb{P}(X_i%20\in%20\mathcal{A})\rightarrow%200), we obtain the point-over-threshold model in extreme value theory (see e.g.… or…): if has a Poisson distribution and, conditionally on,\cdots,X_N are independent identically distributed generalized Pareto random variables, then\max\{X_1,\cdots,X_N\} has the generalized extreme value distribution. Thus, exceedances models (for rare events) are closely related to Poisson processes.

  • The Poisson process

As mentioned above, the Poisson distribution appears when events occur somehow randomly and independently, over time. It is then natural to study the time between two occurences (or two claims, in an insurance context).

  • Poisson distribution, and claims occurrence

It is neither Siméon Poisson nor De Moivre, but Ladislaus Von Bortkiewicz who first mentioned the Poisson distribution as the law of small numbers. In 1898 (see…), he studied the number number of soldiers killed by being kicked by a horse, from 1875 till 1894, in 200 corps (more precisely 10 corps over 20 ans).

He did obtain the following distribution (here, the parameter of the Poisson distribution is 0.61, i.e. the average number of death per year)

number of
death per
0 109 108.67
1 65 66.21
2 22 20.22
3 3 4.11
4 1 0.63
5 and more 0 0.08

It is possible to find a lot of cases where the Poisson distribution fits extremely well. For instance, if we consider the number of hurricanes, that landed in Florida after 1850,

number of
per year
0 30 27.16
1 48 47.99
2 37 42.41
3 29 24.98
4 8 11.03
5 3 3.90
6 3 1.15
7 1 0.29
8 and more 0 0.08
  • Poisson distribution, and return period

The return period was introduced by Emil Gumbel, in hydrology, to link probabilities and durations (see e.g.…). A decennial event has an occurence probability of 1/10. 10 is then the average waiting time before occurence. This does not mean that the event will not occur before 10 years, or has to occur before 10 years. Consider a return period (in years), then the yearly probability of non-occurrence is

And the probability of non-occurence over years is then[1-(1/T)]^n. It is standard to summarize this property with the following table,

return period

Number of years ( without catastrophes

10 20 50 100 200
10 65.1% 40.1% 18.3% 9.6% 4.9%
20 87.8% 64.2% 33.2% 18.2% 9.5%
50 99.5% 92.3% 63.6% 39.5% 22.5%
100 99.9% 99.4% 86.7% 63.4% 39.5%
200 99.9% 99.9% 98.2% 86.6% 63.3%

The diagonal in the table above is extremely interesting. It looks like there is some kind of convergence towards a limiting value (here  63.2%). Indeed, the number of events observed over n years have a Binomial distribution, with probability, which will converge towards the Poisson distribution with parameter 1. The probability of not having a catastrophe is then\exp(-1), which is equal to 0.632.

  • Rare probabilities and the Poisson distribution

The Poisson distribution keeps appearing when computing probabilies of rare events. For instance, the probability to have at least one incident in a nuclear plant in France, over a 50 year period. Assume that the annual probability of an incident in a reactor is small, e.g. 0.05%. Assume further that reactors are independent among them, and in time. The probability to have no incident over 80 reactors in 50 years is (exactly)\mathbb{P}(N=0)=1-(1-p)^{50%20\times%2080}

Of course, a linear approximation is not correct (even if it was mentioned in some French newspaper, as explained in an old post…)\mathbb%20P(N=0)\neq%2050\times%2080\times%20p

On the other hand\mathbb%20P(N=0)=1-(1-p)^{50\times80%20}%20\sim1-\exp\left(-50\times80\times%20p%20\right)

> p=0.00005
> 1-(1-p)^(50*80)
[1] 0.1812733
> 1-exp(-50*80*p)
[1] 0.1812692

which is the probability that is null when has a Poisson distribution with parameter\lambda=50\times80\times%20p. We clearly see here an application of De Moivre’s approximation in risk management.

Another way of looking at this problem is based on the following idea: given the fact that in 45 years of observations on 450 reactors worldwide (roughly), three major accidents were observed including Three Mile Island (1979) and Fukushima (2011), i.e. the average time between accidents can be estimated at 16 years. For a single reactor, we can assume that the average time to wait before an incident is 450 times 16 years, i.e 7200 years. Or the probability to have one incident, over one year, for one reactor is 1 over 7200 (this is the idea behind the return period concept). If we assume that the arrival of accidents occurs randomly and independently of each other (as defined above) then the number of major accidents observed over a period of 50 years in France follows a Poisson distribution with parameter 50 / (7200/80). Also, the probability of having no major accident over 50 years, with 80 reactors can be estimated by\exp(-50\times%2080/7200)


> 1-exp(-50*80/7200)
[1] 0.4262466

(keeping in mind all the uncertainty around the estimated waiting time before a major accident to a single reactor!).

Arthur Charpentier

Arthur Charpentier, professor at UQaM in Actuarial Science. Former professor-assistant at ENSAE Paristech, associate professor at Ecole Polytechnique and assistant professor in Economics at Université de Rennes 1. Graduated from ENSAE, Master in Mathematical Economics (Paris Dauphine), PhD in Mathematics (KU Leuven), and Fellow of the French Institute of Actuaries.

More PostsWebsite

Follow Me:
TwitterLinkedInGoogle Plus

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)