**factbased**, and kindly contributed to R-bloggers)

A historian, a data scientist, a programmer, a mathematician, and a philosopher discuss the question, how likely it is that a lottery draw (`6`

out of `49`

) contains two consecutive numbers.

## The historian

The historian argues that from 1955 up to 2011, there were 5026 lottery draws in Germany, every Saturday, and from 2000 on, two draws every Wednesday. In `2557`

cases, there were consecutive numbers among the drawn numbers. Hence, the likelihood is around `2557/5026=51%`

.

By the way, he adds, did you know how Casanova got rich while he introduced the lottery in Paris?

## The junior data scientist

The data scientist argues that a lottery draw is a repeatable experiment that can be easily simulated. He fires up his R console and creates first a function that checks if a draw contains consecutive numbers:

```
> has.consecutives <- function(draw) any(diff(sort(draw))==1)
```

The function first sorts the numbers drawn, and then calculates the differences between those numbers. If there are consecutive numbers, then there must be a `1`

in these differences.

Now one lottery draw can be simulated with `sample.int(49,6)`

. To get an estimate for the probability of the occurrence of consecutive numbers, he repeats these lottery draws very often:

```
> N <- 100000
> counts <- replicate(N, has.consecutives(sample.int(49,6)))
> sum(counts)/N
[1] 0.49498
```

The simulation suggests the probability is slightly smaller than `0.5`

. To rule out this result was produced by chance, he calculates a confidence interval for the wanted proportion. There is – and that is charming about the statistical computing environment R – a package for that, which implements several methods for this task:

```
> library(binom)
> with(
binom.confint(
sum(counts), N,
method="wilson"),
c(lower, upper)
)
[1] 0.4918814 0.4980790
```

Now the theory states that if the simulation is repeated very often, in `95%`

of the cases the true parameter lies in the calculated intervals. Hence the data scientist is quite certain about the first two digits of the probability.

## The programmer

The programmer takes a recursive approach. If we find a function `f(n,m)`

that returns the number of ways to choose m out of n numbers without consecutive numbers, then we can calculate the wanted probability by

```
> 1 - f(49,6)/choose(49,6)
```

Some corner cases for `f`

are obvious: if `m`

is `1`

, there are `n`

ways. For `n=3`

and `m=2`

, there is exactly one way. By the pigeonhole principle, when `m>ceiling(n/2)`

, there is no way.

In the general `f(n,m)`

case, we can distinguish between the ways where neither the first nor the last number is chosen (`f(n-2,m)`

) and the ways where either the first and/or the last element is chosen. In the latter case, by the inclusion/exclusion principle, the number of ways is `2*f(n-2, m-1) - f(n-4, m-2)`

. In case `m=2`

, the latter term becomes `1`

; this is another hard-coded return value `f(n,0)=1`

. Altogether, the function is defined as

```
> f <- function(n,m) {
if(m==0) return(1)
if(m==1) return(n)
if(n==3 && m==2) return(1)
# pigeonhole principle
if(m>ceiling(n/2)) return(0)
# inclusion/exclusion principle
return(
f(n-2,m)
+ 2*f(n-2, m-1)
- f(n-4, m-2)
)
}
> 1 - f(49,6)/choose(49,6)
[1] 0.4951984
```

## The mathematician

After some thinking, the mathematician comes up with equivalent problem formulation. Instead of counting the number the draws without consecutive numbers, what if we count the number of ways to draw the blanks immediately after the drawn numbers?

Given a draw `N1, ..., N6`

without consecutive numbers, the numbers `N1+1, ..., N6+1`

after each drawn number are not drawn. We have to allow the number `50`

to make this work, though.

Hence, the number of draws `6`

out of `49`

without consecutives corresponds to number of `6`

blanks drawn from `(50-6)`

numbers. So the solution boils down to

```
> 1 - choose(50-6, 6)/choose(49,6)
[1] 0.4951984
```

## The existentialist philosopher

Finally, the philosopher points out that it is not the result, but the solution approaches that are important. The objective truth about consecutive numbers in lottery draws is a fact. Now that it is known, it might be tempting to disregard the ways used to find it. But there is value in subjective reflection, in appreciating the individual thoughts and approaches, as this reflection deepens the understanding of reality.

**leave a comment**for the author, please follow the link and comment on his blog:

**factbased**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...