Secret Santa – again

Posted on November 18, 2012 by Wingfeet in Uncategorized | 0 Comments

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Based on comments by cellocgw I decided to look at last week’s Secret Santa again. This time, the moment a person, whoever that is, draws his/her own name, the drawing starts again at the first person.

Introduction

A group of n persons draws sequentially names for Secret Santa. Each person may not draw his/her own name. If a person draws his/her own name then all names are returned and the drawing starts again. Questions are such as: How often do you draw names?

Simulation

To understand the problem a simulation is build. The simulation draws the names, with restarts if needed. These are placed into outcome. On top of that it counts where a person self-draws.
selsan1 <- function(persons) {
startoutcome <- rep(0,persons)
countstop <- rep(0,persons)
outcome <- startoutcome
done <- FALSE
possible <- 1:persons
who <- 1
while(!done) {
remaining <- possible[!(possible %in% outcome)]
if (length(remaining)==1) {
if (who==remaining) {
countstop[who] <- countstop[who]+1
outcome <- startoutcome
who <- 1
} else {
outcome[who] <- remaining
who <- who+1
done <- TRUE
}
} else {
select <- sample(possible[!(possible %in% outcome)],1)
if (who==select) {
countstop[who] <- countstop[who]+1
outcome <- startoutcome
who <- 1
} else {
outcome[who] <- select
who <- who+1
}
}
}
return(list(outcome=outcome,countstop=countstop))
}
In an unlucky draw for five persons there were four restarts, at person 1, 2, 3 and 5.

selsan1(5)

$outcome

[1] 2 1 5 3 4

$countstop

[1] 1 1 1 0 1

The aim of the simulation is to do this a lot of times and draw conclusions from there.

nrep=1e5

simulations <- lapply(1:nrep,function(x) selsan1(5))

The first question concerns the outcomes. There are 44 allowed results. They are obtained about equally often.

outcomes <- sapply(simulations,function(x) paste(x$outcome,collapse=' '))

as.data.frame(table(outcomes))

outcomes Freq

1 2 1 4 5 3 2301

2 2 1 5 3 4 2263

3 2 3 1 5 4 2192

4 2 3 4 5 1 2323

5 2 3 5 1 4 2268

6 2 4 1 5 3 2230

7 2 4 5 1 3 2280

8 2 4 5 3 1 2242

9 2 5 1 3 4 2203

10 2 5 4 1 3 2247

11 2 5 4 3 1 2327

12 3 1 2 5 4 2321

13 3 1 4 5 2 2243

14 3 1 5 2 4 2183

15 3 4 1 5 2 2346

16 3 4 2 5 1 2264

17 3 4 5 1 2 2216

18 3 4 5 2 1 2269

19 3 5 1 2 4 2301

20 3 5 2 1 4 2359

21 3 5 4 1 2 2301

22 3 5 4 2 1 2206

23 4 1 2 5 3 2291

24 4 1 5 2 3 2249

25 4 1 5 3 2 2263

26 4 3 1 5 2 2321

27 4 3 2 5 1 2265

28 4 3 5 1 2 2318

29 4 3 5 2 1 2329

30 4 5 1 2 3 2220

31 4 5 1 3 2 2309

32 4 5 2 1 3 2228

33 4 5 2 3 1 2291

34 5 1 2 3 4 2284

35 5 1 4 2 3 2314

36 5 1 4 3 2 2304

37 5 3 1 2 4 2339

38 5 3 2 1 4 2213

39 5 3 4 1 2 2285

40 5 3 4 2 1 2194

41 5 4 1 2 3 2255

42 5 4 1 3 2 2336

43 5 4 2 1 3 2218

44 5 4 2 3 1 2289

The number of times a redraw is taken can also be obtained.

countstop <- t(sapply(simulations,function(x) x$countstop))

table(rowSums(countstop))/nrep

0 1 2 3 4 5 6 7 8 9

0.36938 0.23036 0.14788 0.09337 0.05929 0.03673 0.02220 0.01440 0.00988 0.00603

10 11 12 13 14 15 16 17 18 19

0.00396 0.00240 0.00168 0.00089 0.00053 0.00041 0.00021 0.00014 0.00012 0.00005

20 21 22 28

0.00004 0.00003 0.00001 0.00001

We can also extract where the redraws occur. In general there is 0.5 redraw because of the first person, 0.4 because of the second, etc. The numbers do not add to 1, they are not chances.

colSums(countstop)/nrep

[1] 0.54276 0.40735 0.31383 0.24834 0.20415

Calculations

The question is, can we achieve the same with a calculation. For this we obtain the chances of various results. For this we build three matrices. All permutations in pp. Continuation of a sequence is in permitted. Finally, redraw contains the person where a person causes a new draw. Trick here is that if person 2 causes a redraw, then no subsequent persons cause a redraw, hence only 2 is marked in redraw.

pp <- randtoolbox::permut(nn)

redraw <- matrix(FALSE,nrow(pp),nn)

permitted <- redraw

redraw[,1] <- pp[,1] ==1

permitted[,1] <- pp[,1]!=1

for(i in 2:nn) {

permitted[,i] <- pp[,i]!=i & permitted[,i-1]

redraw[,i] <- pp[,i] == i & permitted[,i-1]

}

The sequences start like this.

head(pp)

i i i

[1,] 5 4 3 1 2

[2,] 5 4 3 2 1

[3,] 5 4 1 3 2

[4,] 5 4 2 3 1

[5,] 5 4 1 2 3

[6,] 5 4 2 1 3

head(permitted)

[,1] [,2] [,3] [,4] [,5]

[1,] TRUE TRUE FALSE FALSE FALSE

[2,] TRUE TRUE FALSE FALSE FALSE

[3,] TRUE TRUE TRUE TRUE TRUE

[4,] TRUE TRUE TRUE TRUE TRUE

[5,] TRUE TRUE TRUE TRUE TRUE

[6,] TRUE TRUE TRUE TRUE TRUE

head(redraw)

[,1] [,2] [,3] [,4] [,5]

[1,] FALSE FALSE TRUE FALSE FALSE

[2,] FALSE FALSE TRUE FALSE FALSE

[3,] FALSE FALSE FALSE FALSE FALSE

[4,] FALSE FALSE FALSE FALSE FALSE

[5,] FALSE FALSE FALSE FALSE FALSE

[6,] FALSE FALSE FALSE FALSE FALSE

Chance of succes

The chance of a success drawing is the mean of the last column in permitted. Below a comparison with the simulation result. First the observed proportions.

byrow <- as.data.frame(table(rowSums(countstop))/nrep)

head(byrow)

Var1 Freq

1 0 0.36938

2 1 0.23036

3 2 0.14788

4 3 0.09337

5 4 0.05929

6 5 0.03673

Now the matching calculation. The numbers can be calculated easily.

(p.succes <- mean(permitted[,nn]))

[1] 0.3666667

byrow$n <- as.numeric(levels(byrow$Var1)[byrow$Var1])

byrow$p <- sapply(byrow$n,function(x) p.succes*(1-p.succes)^x)

byrow[,c(1,3,4,2)]

Var1 n p Freq

1 0 0 3.666667e-01 0.36938

2 1 1 2.322222e-01 0.23036

3 2 2 1.470741e-01 0.14788

4 3 3 9.314691e-02 0.09337

5 4 4 5.899305e-02 0.05929

6 5 5 3.736226e-02 0.03673

7 6 6 2.366277e-02 0.02220

8 7 7 1.498642e-02 0.01440

9 8 8 9.491398e-03 0.00988

10 9 9 6.011219e-03 0.00603

11 10 10 3.807105e-03 0.00396

12 11 11 2.411167e-03 0.00240

13 12 12 1.527072e-03 0.00168

14 13 13 9.671458e-04 0.00089

15 14 14 6.125256e-04 0.00053

16 15 15 3.879329e-04 0.00041

17 16 16 2.456908e-04 0.00021

18 17 17 1.556042e-04 0.00014

19 18 18 9.854933e-05 0.00012

20 19 19 6.241457e-05 0.00005

21 20 20 3.952923e-05 0.00004

22 21 21 2.503518e-05 0.00003

23 22 22 1.585561e-05 0.00001

24 28 28 1.023239e-06 0.00001

Where fall the redraws

This is actually a more difficult calculation (or I forgot too much probability). Luckily a bit of brute force comes in handy. To reiterate, in general simulated data shows 0.54 redraws because of the first person etc.

colSums(countstop)/nrep

[1] 0.54276 0.40735 0.31383 0.24834 0.20415

So, what happens in a drawing? The outcomes follow from the matrix redraw. There is 20% chance the first person draws 1, 25% chance the second person draws a 2 etc. Finally, as established, the chance is 36% to have a good draw.

(p.onedraw <- c(colSums(redraw)/nrow(redraw),p.succes) )

[1] 0.20000000 0.15000000 0.11666667 0.09166667 0.07500000 0.36666667

The function below takes these numbers and a locator vector to return a data frame with chances, location of fail and success status in column finish

one.draw <- function(status.now,p.now,p.onedraw) {

la <- lapply(1:(nn+1),function(x) {

status <- status.now

if (x>nn) finish=TRUE

else {

status[x] <- status[x] +1

finish=FALSE}

list(status=status,p=p.onedraw[x],finish=finish)

})

res <- as.data.frame(do.call(rbind, lapply(la,function(x) x$status)))

res$p <- sapply(la,function(x) x$p*p.now)

res$finish <- sapply(la,function(x) x$finish)

res

}

status.now <- rep(0,nn)

names(status.now) <- paste('person',1:5,sep='')

od <- one.draw(status.now,1,p.onedraw)

person1 person2 person3 person4 person5 p finish

1 1 0 0 0 0 0.20000000 FALSE

2 0 1 0 0 0 0.15000000 FALSE

3 0 0 1 0 0 0.11666667 FALSE

4 0 0 0 1 0 0.09166667 FALSE

5 0 0 0 0 1 0.07500000 FALSE

6 0 0 0 0 0 0.36666667 TRUE

Where the sequence is not finished, the same chances apply again. For this a second function is build. Same outcomes are combined to restrict the number of outcomes.

one.draw.wrap <- function(x,p.pnedraw) {

if (x$finish) return(x)

one.draw(x[grep(‘person’,names(x))],x$p,p.onedraw)

}

new.draw <- function(od,p.onedraw) {

todo <- od[!od$finish,]

done <- od[od$finish,]

la <- lapply(1:nrow(todo),function(x) one.draw.wrap(todo[x,],p.onedraw))

snd <- do.call(rbind,la)

snd <- snd[do.call(order,snd),]

i <- 1

while(i

if(all(snd[i,!(colnames(snd)==’p’)] == snd[i+1,!(colnames(snd)==’p’)])) {

snd$p[i] <- snd$p[i] + snd$p[i+1]

snd <- snd[-(i+1),]

}

else i=i+1

}

snd <- rbind(snd, done)

snd[order(-snd$p),]

}

head(new.draw(od))

   person1 person2 person3 person4 person5          p finish

61       0       0       0       0       0 0.36666667   TRUE

6        1       0       0       0       0 0.07333333   TRUE

2        1       1       0       0       0 0.06000000  FALSE

25       0       1       0       0       0 0.05500000   TRUE

3        1       0       1       0       0 0.04666667  FALSE

35       0       0       1       0       0 0.04277778   TRUE

There are still quite some unfinished drawings, so we can redo that a few times. This is where a fast processor is practical.

myfin <- list(od=od)

for (i in 1:15) myfin[[i+1]] <- new.draw(myfin[[i]])
Even after 15 steps there are some unfinished sequences. Nevertheless I stop here.

xtabs(p ~finish,data=myfin[[15]])

finish

FALSE TRUE

0.001057999 0.998942001

In the finished series there are in on average 0.54 redraws at person 1. That is pretty close to 0.54 from the simulation.

with(myfin[[15]],sum(person1[finish]*p[finish]))

[1] 0.5398659

However, the 0.1% unfinished sequences would contribute a lot. Even if they were finished at step 15.

with(myfin[[15]],sum(person1*p))

0.5448775

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Secret Santa – again

Introduction

Simulation

Calculations

Chance of succes

Where fall the redraws

Related

Introduction

Simulation

Calculations

Chance of succes

Where fall the redraws

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)