# Consecutive number and lottery

October 25, 2011
By

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

Recently, I have been reading odd things about strategies to win at the lottery. E.g.

or

I wrote something a long time ago, but maybe it would be better to write another post. First, it is easy to get data on the French lotteries, including draws, number of winners and gains,

loto=read.table("http://freakonometrics.blog.free.fr/public/data/loto.csv",sep=";",header=TRUE)
balls=loto[,c("boule_1","boule_2","boule_3","boule_4","boule_5","boule_6")]
q=function(x){quantile(x,(0:5)/5)}
sortballs=balls
consec=balls[,-1]
sortconsec=consec
for(i in 1:nrow(balls)){sortballs[i,]=q(balls[i,])
consec[i,]=sortballs[i,2:6]-sortballs[i,1:5]
sortconsec[i,]=sort(consec[i,])}
winner1=loto[,"nombre_de_gagnant_au_rang1"]
gain1=as.numeric(as.character(loto[,"rapport_du_rang1"]))
winner2=loto[,"nombre_de_gagnant_au_rang2"]
gain2=as.numeric(as.character(loto[,"rapport_du_rang2"]))
winner3=loto[,"nombre_de_gagnant_au_rang3"]
gain3=as.numeric(as.character(loto[,"rapport_du_rang3"]))
winner4=loto[,"nombre_de_gagnant_au_rang4"]
gain4=as.numeric(as.character(loto[,"rapport_du_rang4"]))
winner5=loto[,"nombre_de_gagnant_au_rang5"]
gain5=as.numeric(as.character(loto[,"rapport_du_rang5"]))
which1=(sortconsec[,1]==1)
which2=(sortconsec[,2]==1)
which3=(sortconsec[,3]==1)
which4=(sortconsec[,4]==1)
which5=(sortconsec[,5]==1)

There several ways to defining "winning at the lottery" (2 out of 6, 3 out of 6, 4 out of 6, etc) and to define "having consecutive numbers" (it can be 2 out of 6, 3 out of 6, etc). For instance,

It is also possible to compare the number of winners obtained for medium winners (3 out of 6, so called vainqueur de rang 4) when there were 2 consecutive numbers

> t.test(winner4[which1==TRUE],winner4[which1==FALSE])

Welch Two Sample t-test

data:  winner4[which1 == TRUE] and winner4[which1 == FALSE]
t = -3.2132, df = 4792.491, p-value = 0.001321
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6864.430 -1662.123
sample estimates:
mean of x mean of y
33887.82  38151.10 

With a simple mean comparison test, we have that there is a significant difference between average number of winners when there were at least 2 consecutive numbers out of 6 balls drawn. And actually, the average number of winners was lower when there are consecutive numbers. And if we look at the average gain, we have also a significant difference

> t.test(gain4[which1==TRUE],gain4[which1==FALSE])

Welch Two Sample t-test

data:  gain4[which1 == TRUE] and gain4[which1 == FALSE]
t = 5.8926, df = 3675.361, p-value = 4.143e-09
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
11.06189 22.09337
sample estimates:
mean of x mean of y
173.9788  157.4012 

Here we see that if we play consecutive numbers, on average, the gain is larger. Perhaps it would be better to look at that on graphs.

which0=which1
WIN=c(mean(winner5[which0==TRUE]),mean(winner5[which0==FALSE]),
mean(winner4[which0==TRUE]),mean(winner4[which0==FALSE]),
mean(winner3[which0==TRUE]),mean(winner3[which0==FALSE]),
mean(winner2[which0==TRUE]),mean(winner2[which0==FALSE]),
mean(winner1[which0==TRUE]),mean(winner1[which0==FALSE]))
MWIN=matrix(WIN,2,5)

plot(1:5,MWIN[1,],type="b",col="red",log="y",
ylim=c(1,1000000),xlab="two consecutive numbers",ylab="number of winners (log scale)")
lines(1:5,MWIN[2,],type="b",col="blue",pch=4)

If we focus on the case where "having consecutive numbers" means two consecutive numbers, we have below the number of winners, with first rank (6 out of 6), then second rank (5 out of 6), etc,

Note that the y-axis is on a log scale, and that draws with consecutive balls are in red, and no consecutive balls are in blue. If we focus on average gains, curves are in opposite order,

But if we consider the case of three consecutive balls, we have, for the number of winners,

or for average gains

Here it starts to get slightly different: there are more "big winners" when there are at least three consecutive numbers. And with four consecutive numbers, it is clearly the opposite

Here we see that there are much more winners with four consecutive numbers (actually, it might be a triplet and a pair). So I have to confess that I am not convinced by the conclusion: actually a lot a people pick consecutive numbers... Actually, if we look at draws where there were the more winners, we can clearly see that a lot of players like consecutive numbers (perhaps not has much as playing birthdays, since most numbers are lower than 31),

loto[loto\$"nombre_de_gagnant_au_rang1">50,
c("combinaison_gagnante_en_ordre_croissant",
"nombre_de_gagnant_au_rang1","date_de_tirage")]
combinaison_gagnante gagnant_au_rang1
3189      2-4-13-16-28-31              103
3475       1-5-9-10-12-25               64
4018       4-5-7-14-15-17               63
4396    26-27-28-35-36-37               64
4477     7-11-15-27-33-44               53
4546      2-9-12-14-19-24               60
4685      2-8-10-12-14-16               96
date_de_tirage
3189       19930626
3475       19920212
4018       19880504
4396       19840919
4477       19830914
4546       19820519
4685       19790919

On September 1979, there were 5 even consecutive numbers (ok, it can not be considered strictly as consecutive numbers) and 96 winners with 6 numbers out of 6 ! And if we look at the others, even if they are not strictly consecutive, there is a lot of regularity. So I believe that picking consecutive numbers might not be a great strategy if you want to win a lot of money !

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...