**Freakonometrics - Tag - R-english**, and kindly contributed to R-bloggers)

Recently, I have been reading odd things about strategies to win at the lottery. E.g.

or

I wrote something a long time ago, but maybe it would be better to write another post. First, it is easy to get data on the French lotteries, including draws, number of winners and gains,

loto=read.table("http://freakonometrics.blog.free.fr/public/

data/loto.csv",sep=";",header=TRUE) balls=loto[,c("boule_1","boule_2","boule_3",

"boule_4","boule_5","boule_6")] q=function(x){quantile(x,(0:5)/5)} sortballs=balls consec=balls[,-1] sortconsec=consec for(i in 1:nrow(balls)){sortballs[i,]=q(balls[i,]) consec[i,]=sortballs[i,2:6]-sortballs[i,1:5] sortconsec[i,]=sort(consec[i,])} winner1=loto[,"nombre_de_gagnant_au_rang1"] gain1=as.numeric(as.character(loto[,"rapport_du_rang1"])) winner2=loto[,"nombre_de_gagnant_au_rang2"] gain2=as.numeric(as.character(loto[,"rapport_du_rang2"])) winner3=loto[,"nombre_de_gagnant_au_rang3"] gain3=as.numeric(as.character(loto[,"rapport_du_rang3"])) winner4=loto[,"nombre_de_gagnant_au_rang4"] gain4=as.numeric(as.character(loto[,"rapport_du_rang4"])) winner5=loto[,"nombre_de_gagnant_au_rang5"] gain5=as.numeric(as.character(loto[,"rapport_du_rang5"])) which1=(sortconsec[,1]==1) which2=(sortconsec[,2]==1) which3=(sortconsec[,3]==1) which4=(sortconsec[,4]==1) which5=(sortconsec[,5]==1)

There several ways to defining "*winning at the lottery*" (2 out of 6, 3 out of 6, 4 out of 6, etc) and to define "*having consecutive numbers*" (it can be 2 out of 6, 3 out of 6, etc). For instance,

It is also possible to compare the number of winners obtained for medium winners (3 out of 6, so called *vainqueur de rang 4*) when there were 2 consecutive numbers

> t.test(winner4[which1==TRUE],winner4[which1==FALSE]) Welch Two Sample t-test data: winner4[which1 == TRUE] and winner4[which1 == FALSE] t = -3.2132, df = 4792.491, p-value = 0.001321 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -6864.430 -1662.123 sample estimates: mean of x mean of y 33887.82 38151.10

With a simple mean comparison test, we have that there is a significant difference between average number of winners when there were at least 2 consecutive numbers out of 6 balls drawn. And actually, the average number of winners was lower when there are consecutive numbers. And if we look at the average gain, we have also a significant difference

> t.test(gain4[which1==TRUE],gain4[which1==FALSE]) Welch Two Sample t-test data: gain4[which1 == TRUE] and gain4[which1 == FALSE] t = 5.8926, df = 3675.361, p-value = 4.143e-09 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 11.06189 22.09337 sample estimates: mean of x mean of y 173.9788 157.4012

Here we see that if we play consecutive numbers, on average, the gain is larger. Perhaps it would be better to look at that on graphs.

which0=which1 WIN=c(mean(winner5[which0==TRUE]),mean(winner5[which0==FALSE]), mean(winner4[which0==TRUE]),mean(winner4[which0==FALSE]), mean(winner3[which0==TRUE]),mean(winner3[which0==FALSE]), mean(winner2[which0==TRUE]),mean(winner2[which0==FALSE]), mean(winner1[which0==TRUE]),mean(winner1[which0==FALSE])) MWIN=matrix(WIN,2,5) plot(1:5,MWIN[1,],type="b",col="red",log="y", ylim=c(1,1000000),xlab="two consecutive numbers",

ylab="number of winners (log scale)") lines(1:5,MWIN[2,],type="b",col="blue",pch=4)

If we focus on the case where "*having consecutive numbers*" means two consecutive numbers, we have below the number of winners, with first rank (6 out of 6), then second rank (5 out of 6), etc,

Note that the y-axis is on a log scale, and that draws with consecutive balls are in red, and no consecutive balls are in blue. If we focus on average gains, curves are in opposite order,

But if we consider the case of three consecutive balls, we have, for the number of winners,

or for average gains

Here it starts to get slightly different: there are more "big winners" when there are at least three consecutive numbers. And with four consecutive numbers, it is clearly the opposite

Here we see that there are much more winners with four consecutive numbers (actually, it might be a triplet and a pair). So I have to confess that I am not convinced by the conclusion: actually a lot a people pick consecutive numbers... Actually, if we look at draws where there were the more winners, we can clearly see that a lot of players like consecutive numbers (perhaps not has much as playing birthdays, since most numbers are lower than 31),

loto[loto$"nombre_de_gagnant_au_rang1">50, c("combinaison_gagnante_en_ordre_croissant", "nombre_de_gagnant_au_rang1","date_de_tirage")] combinaison_gagnante gagnant_au_rang1 3189 2-4-13-16-28-31 103 3475 1-5-9-10-12-25 64 4018 4-5-7-14-15-17 63 4396 26-27-28-35-36-37 64 4477 7-11-15-27-33-44 53 4546 2-9-12-14-19-24 60 4685 2-8-10-12-14-16 96 date_de_tirage 3189 19930626 3475 19920212 4018 19880504 4396 19840919 4477 19830914 4546 19820519 4685 19790919

On September 1979, there were 5 even consecutive numbers (ok, it can not be considered strictly as consecutive numbers) and 96 winners with 6 numbers out of 6 ! And if we look at the others, even if they are not strictly consecutive, there is a lot of regularity. So I believe that picking consecutive numbers might not be a great strategy if you want to win a lot of money !

**leave a comment**for the author, please follow the link and comment on his blog:

**Freakonometrics - Tag - R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...