**Christophe Ladroue » R**, and kindly contributed to R-bloggers)

Over in my previous post, I’m giving away 3 copies of my video course on **ggplot2** and **shiny**. To win a copy, you just need to leave a comment and I will select 3 winners among the *n* participants at random after a deadline.

But how do I pick 3 winners such that:

- all players are equally likely to win.
- no-one can contest the fairness of the selection.

The first thing that comes to mind is to run `sample(n,3, replace = FALSE)`

and report the winners. But how do you know I actually ran this code? I could have decided on the winners well in advance and just pretended to run the code.

A way to approach this issue could be to set the random seed to some value so that anyone suspecting foul play can run the code themselves and get the same answer as me:

I see at least two problems with it: 1) I could still have selected a seed that gives me the sample I eventually want, and 2) even using a function (*e.g.* of *n* the number of participants) as the seed doesn’t guarantee a uniform distribution for each player.

I came up with a plan which I think addresses both the uniform distribution over the players, and the incontestability of the selection.

First, I simplify the problem of selecting 3 winners among *n* participants to selecting 1 integer from a uniform distribution. This is easy: instead of choosing 3 items among *n*, I’m selecting 1 of the `choose(n,3) `

possible combinations. Once I’ve sampled 1 number *i*, I simply use `combn(n,3)`

to generate all the combinations and pick the *i*th item:

`combn(n,3, simplify=FALSE)[[i]]`

.

Second, I have a way to pick a number from a uniform distribution that’s completely transparent. To do this, I pick up a number uniformly at random in a much bigger set (1:N, with `N>>choose(n,3)`

) and project this number back to the interval I’m interested in (`1:choose(n,3)`

). That is, once I have a number *j* between 1 and N, I use

to find a random value uniformly distributed from 1 to `choose(n,3)`

Ideally you’d want N to be a multiple of choose(n,3) for every outcome to be exactly equally likely, but if N is much larger than choose(n,3), the slight difference in probability for each outcome is negligible (of the order of `1/N`

).

Now, how do I pick up a number in a bigger set in a transparent manner? We saw that using the seed is fraught with difficulty, so I need something else. I’m going to use something which neither I nor the players have any control over: the UK national lottery, which is a combination of 7 integers from the set {1,…,49}. More precisely, I’m doing this:

- declare in advance which future lottery draw I’m using.
- use the
*lexicographic index*of the combination drawn as my number*j*;*j*comes from a uniform distribution between 1 and`N=choose(49,7)`

, which is a relatively big number.

And this is it: this way, I pick up 3 winners among *n* and there is no way for me (or the players) to rig the selection. Here is the code I’ll run:

1 2 3 4 5 6 |
lotteryResult <- c( , , , , , , ) # to be filled in by the actual lottery draw nPlayers <- 200 # to be updated with the number of participants nWinners <- 3 index <- ceiling(choose(nPlayers, nWinners) * lexicographicIndex(lotteryResult, 49) / choose(49,7)) winners <- combn(nPlayers,nWinners,simplify = FALSE)[[index]] cat("\n The winners are:", winners) |

The deadline for the competition is Wednesday 09th July at midnight UK time. The next lottery (UK lotto) draw after that is on Saturday 12th July, and I’ll use that draw to decide on the winners, using the code I’ve presented here.

What do you think? Can you poke holes in my strategy? Can you come up with something simpler?

**Note about the lexicographic index**

It is not terribly difficult to find the index of a combination without generating them all. All you need to do is to count the number of combinations that appeared before. For example, if the combination starts with 3, you know it comes after all the combinations that start with 1 and 2. Here is the code I wrote to go from a combination to its lexicographic index. There’s also a test function after it.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
# gives the index of the combination if sorted in lexicographic order, starting at 1 # lexicographicIndex(c(2,3,4), 5) = 7 # because [2,3,4] is the 7th item in the lexicographically ordered combinations of length 3 using 5 letters: # 1 2 3 # 1 2 4 # 1 2 5 # 1 3 4 # 1 3 5 # 1 4 5 # 2 3 4 # 2 3 5 # 2 4 5 # 3 4 5 # C. Ladroue # combination is a sequence of unique integers between 1 and alphabetSize lexicographicIndex <- function(combination, alphabetSize){ combination <- sort(combination) combinationLength <- length(combination) index <- 1 for(p in 1:combinationLength){ starting <- ifelse(p == 1, 1 , combination[p-1] + 1) finishing <- combination[p] - 1 if(starting <= finishing) index <- index + sum(sapply(starting:finishing, function(j) choose(alphabetSize - j, combinationLength - p))) } index } lexicographicIndexTest <- function(){ alphabetSize <- 10 combinationLength <- 3 x <- combn(alphabetSize, combinationLength, simplify = FALSE) cat("\n test all combinations with alphabet size = ",alphabetSize,"and combination length = ",combinationLength,": ", all(sapply(1:length(x), function(index) lexicographicIndex(x[[index]], alphabetSize) == index )) ) cat("\n") } |

**leave a comment**for the author, please follow the link and comment on their blog:

**Christophe Ladroue » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...