(This article was first published on

**Freakonometrics - Tag - R-english**, and kindly contributed to R-bloggers)
The only thing I remember from courses I had in probability a
few years ago is that we also have to *clearly* defined the
event we want to calculate the probability. On the Freakonomics blog,
last week, the Israeli lottery was mentioned (here, see also there
where I mentioned that, and odds facts from the French lottery),

*probability error*... Well, since Andrew is really a statistician (and a good one... while I am barely an economist), I tried to do the maths.... and to understand where the

*error*was coming from...

Since 6 numbers are drawn out of a pool of numbers from 1 to 37, the total number of combination at each lottery is

> (n=choose(37,6))

[1] 2324784

Over 8 lotteries (since there are two draws per week, we can assume there 8 draws per month), the probability of no identical draws is

Here is the R code for those who want to check, again,

> prod(n-0:7)/n^8

[1] 0.999988

Each month, the probability of "coincidence" (I define "

*coincidence*" the event "

*over 8 draws, at least two times, we obtained the same 6-uplet*" or more precisely (as mentioned here) "

*over one calendar month, at least two times, we obtained the same 6-uplet*") is p=1.204407e-05.

> (p=1-(prod(n-0:7)/n^8))

[1] 1.204407e-05

The occurrence of a coincidence each month as a Geometric distribution, with probability p. And it is classical, following Gumbel's definition (here), to consider 1/p, called the "

*return period*", i.e. the number of months we have to wait until we observe a coincidence (i.e. a repetition in the same month), since for a geometric distribution

> 1/p/(12)

[1] 6919.034

Here, the (expected) return period is 6919

*years*.

From my point of view, this is “

*the incident of six numbers repeating themselves within a calendar month*”, and this is an event of once in 6919.034 years. On the other hand the median of a geometric distribution is

> -log(2)/log(1-p)/(12)

[1] 4795.88

which means that we have 50% chance to get such a coincidence over 4796 years.

Of course, if instead of looking at a longer period, say 100 draws, i.e. one year (here I define "

*coincidence*" the event "

*over 100 draws, at least two times, we obtained the same 6-uplet*"), we have in red the expected return period, and in blue the median of the geometric distribution,

> M=E=rep(NA,100)

> for(i in 2:100){

+ p=1-exp((sum(log(n-0:(i-1)))-i*log(n)))

+ E[i]=1/p/(100/i)

+ M[i]=-log(2)/log(1-p)/(100/i)

+ }

> plot(1:100,E,ylim=c(0,10000),type="l",col="red",lwd=2)

> lines(1:100,M,col="blue",lwd=2)

> abline(v=8,lty=2)

> points(8,E[8],pch=19,col="red")

> points(8,M[8],pch=19,col="blue")

or below of a log-scaled version

As Xi'an did (here), assume now that there is a lottery over 100 countries. Here I define "

*coincidence*" the event "

*over k lottery draws over 100 around the world, at least two times, we obtained the same 6-uplet*", and then the previous graph becomes (with on the

*x*axis the level of

*k*)

Here I have a 12% chance if we consider probability to have identical numbers over a month...

But here, we can have one 6-uplet in Israel, and the other one in Egypt, say... If we want to get the same 6-uplet in the same country, the graph is now

i.e. each month there is a chance over one thousand...

> i=8

> p=1-exp((sum(log(n-0:(i-1)))-i*log(n)))

> 1-(1-p)^100

[1] 0.001203689

**Note**: actually, Xi'an mentioned that the probability that this coincidence [of two identical draws over 188 draws] occurred in at least one out of 100 lotteries (there are hundreds of similar lotteries across the World) is 53%! And I got the same,

> 1-(1-P[188])^100

[1] 0.5305219

To

**leave a comment**for the author, please follow the link and comment on his blog:**Freakonometrics - Tag - R-english**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...