Comments on probabilities

November 2, 2010

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

The only thing I remember from courses I had in probability a
few years ago is that we also have to clearly defined the
event we want to calculate the probability. On the Freakonomics blog,
last week, the Israeli lottery was mentioned (here, see also there
where I mentioned that, and odds facts from the French lottery),

Yesterday, Andrew Gelman claimed (here)
that there was a probability
… Well, since Andrew is really a statistician (and a good one… while I am
barely an economist), I tried to do the maths….
and to understand where the error was coming from…
Since 6 numbers are drawn out of a pool of numbers from 1 to 37, the total
number of combination at each lottery is

> (n=choose(37,6))
[1] 2324784

Over 8 lotteries (since there are two draws per week, we can assume there 8 draws
per month)
, the probability of no identical
draws is

Here is the R code for those who want to check, again,
> prod(n-0:7)/n^8
[1] 0.999988

Each month, the probability of “coincidence” (I define “coincidence” the
event “over 8 draws, at
least two times, we obtained the same 6-uplet
” or more precisely (as mentioned here)
over one calendar month, at
least two times, we obtained the same 6-uplet
) is
> (p=1-(prod(n-0:7)/n^8))
[1] 1.204407e-05

The occurrence of a coincidence each month as a Geometric distribution,
with probability p. And it is classical, following Gumbel’s definition (here),
to consider 1/p, called the “return
“, i.e. the number of months we have to wait until
we observe a coincidence (i.e. a repetition in the same month), since
for a geometric distribution

[1] 6919.034

Here, the (expected) return period is 6919 years.
From my point of view, this is “the incident of six numbers
repeating themselves within a calendar month
”, and this is an event of
once in 6919.034 years. On the other hand the median of a geometric
distribution is

> -log(2)/log(1-p)/(12)
[1] 4795.88

which means that we have 50%
chance to get such a coincidence over 4796 years.

Of course, if instead of
looking at a longer period, say 100 draws, i.e. one year
I define “coincidence
the event “over 100
draws, at least two times, we obtained the same 6-uplet
we have in red the expected return period, and in blue the median of the geometric distribution,

> M=E=rep(NA,100)
> for(i in 2:100){
+ p=1-exp((sum(log(n-0:(i-1)))-i*log(n)))
+ E[i]=1/p/(100/i)
+ M[i]=-log(2)/log(1-p)/(100/i)
+ }
> plot(1:100,E,ylim=c(0,10000),type=”l”,col=”red”,lwd=2)
> lines(1:100,M,col=”blue”,lwd=2)
> abline(v=8,lty=2)
> points(8,E[8],pch=19,col=”red”)
> points(8,M[8],pch=19,col=”blue”)

or below of a log-scaled version

As Xi’an did (here), assume now that there is a lottery over 100
countries. Here I define “coincidence
the event “over k
lottery draws over 100 around the world, at least two times, we
obtained the same 6-uplet
and then the previous graph becomes (with on the x axis the level of k)

Here I have a 12% chance if we consider probability to have identical numbers over a month…
But here, we can have one 6-uplet in Israel, and the other one in Egypt, say… If we want to get the same 6-uplet in the same country, the graph is now
i.e. each month there is a chance over one thousand…
> i=8
> p=1-exp((sum(log(n-0:(i-1)))-i*log(n)))
> 1-(1-p)^100
[1] 0.001203689

actually, Xi’an mentioned that the probability that this coincidence [of
two identical draws over 188 draws] occurred in at least one out of 100
lotteries (there are hundreds of similar lotteries across the World) is
53%! And I got the same,
> 1-(1-P[188])^100
[1] 0.5305219

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)