Secret Santa – again
[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Based on comments by cellocgw I decided to look at last week’s Secret Santa again. This time, the moment a person, whoever that is, draws his/her own name, the drawing starts again at the first person.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
A group of n persons draws sequentially names for Secret Santa. Each person may not draw his/her own name. If a person draws his/her own name then all names are returned and the drawing starts again. Questions are such as: How often do you draw names?Simulation
To understand the problem a simulation is build. The simulation draws the names, with restarts if needed. These are placed into outcome. On top of that it counts where a person self-draws.selsan1 <- function(persons) {
startoutcome <- rep(0,persons)
countstop <- rep(0,persons)
outcome <- startoutcome
done <- FALSE
possible <- 1:persons
who <- 1
while(!done) {
remaining <- possible[!(possible %in% outcome)]
if (length(remaining)==1) {
if (who==remaining) {
countstop[who] <- countstop[who]+1
outcome <- startoutcome
who <- 1
} else {
outcome[who] <- remaining
who <- who+1
done <- TRUE
}
} else {
select <- sample(possible[!(possible %in% outcome)],1)
if (who==select) {
countstop[who] <- countstop[who]+1
outcome <- startoutcome
who <- 1
} else {
outcome[who] <- select
who <- who+1
}
}
}
return(list(outcome=outcome,countstop=countstop))
}
In an unlucky draw for five persons there were four restarts, at person 1, 2, 3 and 5.
selsan1(5)
$outcome
[1] 2 1 5 3 4
$countstop
[1] 1 1 1 0 1
The aim of the simulation is to do this a lot of times and draw conclusions from there.
nrep=1e5
simulations <- lapply(1:nrep,function(x) selsan1(5))
The first question concerns the outcomes. There are 44 allowed results. They are obtained about equally often.
outcomes <- sapply(simulations,function(x) paste(x$outcome,collapse=' '))
as.data.frame(table(outcomes))
outcomes Freq
1 2 1 4 5 3 2301
2 2 1 5 3 4 2263
3 2 3 1 5 4 2192
4 2 3 4 5 1 2323
5 2 3 5 1 4 2268
6 2 4 1 5 3 2230
7 2 4 5 1 3 2280
8 2 4 5 3 1 2242
9 2 5 1 3 4 2203
10 2 5 4 1 3 2247
11 2 5 4 3 1 2327
12 3 1 2 5 4 2321
13 3 1 4 5 2 2243
14 3 1 5 2 4 2183
15 3 4 1 5 2 2346
16 3 4 2 5 1 2264
17 3 4 5 1 2 2216
18 3 4 5 2 1 2269
19 3 5 1 2 4 2301
20 3 5 2 1 4 2359
21 3 5 4 1 2 2301
22 3 5 4 2 1 2206
23 4 1 2 5 3 2291
24 4 1 5 2 3 2249
25 4 1 5 3 2 2263
26 4 3 1 5 2 2321
27 4 3 2 5 1 2265
28 4 3 5 1 2 2318
29 4 3 5 2 1 2329
30 4 5 1 2 3 2220
31 4 5 1 3 2 2309
32 4 5 2 1 3 2228
33 4 5 2 3 1 2291
34 5 1 2 3 4 2284
35 5 1 4 2 3 2314
36 5 1 4 3 2 2304
37 5 3 1 2 4 2339
38 5 3 2 1 4 2213
39 5 3 4 1 2 2285
40 5 3 4 2 1 2194
41 5 4 1 2 3 2255
42 5 4 1 3 2 2336
43 5 4 2 1 3 2218
44 5 4 2 3 1 2289
The number of times a redraw is taken can also be obtained.
countstop <- t(sapply(simulations,function(x) x$countstop))
table(rowSums(countstop))/nrep
0 1 2 3 4 5 6 7 8 9
0.36938 0.23036 0.14788 0.09337 0.05929 0.03673 0.02220 0.01440 0.00988 0.00603
10 11 12 13 14 15 16 17 18 19
0.00396 0.00240 0.00168 0.00089 0.00053 0.00041 0.00021 0.00014 0.00012 0.00005
20 21 22 28
0.00004 0.00003 0.00001 0.00001
We can also extract where the redraws occur. In general there is 0.5 redraw because of the first person, 0.4 because of the second, etc. The numbers do not add to 1, they are not chances.
colSums(countstop)/nrep
[1] 0.54276 0.40735 0.31383 0.24834 0.20415
Calculations
The question is, can we achieve the same with a calculation. For this we obtain the chances of various results. For this we build three matrices. All permutations in pp. Continuation of a sequence is in permitted. Finally, redraw contains the person where a person causes a new draw. Trick here is that if person 2 causes a redraw, then no subsequent persons cause a redraw, hence only 2 is marked in redraw.
pp <- randtoolbox::permut(nn)
redraw <- matrix(FALSE,nrow(pp),nn)
permitted <- redraw
redraw[,1] <- pp[,1] ==1
permitted[,1] <- pp[,1]!=1
for(i in 2:nn) {
permitted[,i] <- pp[,i]!=i & permitted[,i-1]
redraw[,i] <- pp[,i] == i & permitted[,i-1]
}
The sequences start like this.
head(pp)
i i i
[1,] 5 4 3 1 2
[2,] 5 4 3 2 1
[3,] 5 4 1 3 2
[4,] 5 4 2 3 1
[5,] 5 4 1 2 3
[6,] 5 4 2 1 3
head(permitted)
[,1] [,2] [,3] [,4] [,5]
[1,] TRUE TRUE FALSE FALSE FALSE
[2,] TRUE TRUE FALSE FALSE FALSE
[3,] TRUE TRUE TRUE TRUE TRUE
[4,] TRUE TRUE TRUE TRUE TRUE
[5,] TRUE TRUE TRUE TRUE TRUE
[6,] TRUE TRUE TRUE TRUE TRUE
head(redraw)
[,1] [,2] [,3] [,4] [,5]
[1,] FALSE FALSE TRUE FALSE FALSE
[2,] FALSE FALSE TRUE FALSE FALSE
[3,] FALSE FALSE FALSE FALSE FALSE
[4,] FALSE FALSE FALSE FALSE FALSE
[5,] FALSE FALSE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE FALSE
Chance of succes
The chance of a success drawing is the mean of the last column in permitted. Below a comparison with the simulation result. First the observed proportions.
byrow <- as.data.frame(table(rowSums(countstop))/nrep)
head(byrow)
Var1 Freq
1 0 0.36938
2 1 0.23036
3 2 0.14788
4 3 0.09337
5 4 0.05929
6 5 0.03673
Now the matching calculation. The numbers can be calculated easily.
(p.succes <- mean(permitted[,nn]))
[1] 0.3666667
byrow$n <- as.numeric(levels(byrow$Var1)[byrow$Var1])
byrow$p <- sapply(byrow$n,function(x) p.succes*(1-p.succes)^x)
byrow[,c(1,3,4,2)]
Var1 n p Freq
1 0 0 3.666667e-01 0.36938
2 1 1 2.322222e-01 0.23036
3 2 2 1.470741e-01 0.14788
4 3 3 9.314691e-02 0.09337
5 4 4 5.899305e-02 0.05929
6 5 5 3.736226e-02 0.03673
7 6 6 2.366277e-02 0.02220
8 7 7 1.498642e-02 0.01440
9 8 8 9.491398e-03 0.00988
10 9 9 6.011219e-03 0.00603
11 10 10 3.807105e-03 0.00396
12 11 11 2.411167e-03 0.00240
13 12 12 1.527072e-03 0.00168
14 13 13 9.671458e-04 0.00089
15 14 14 6.125256e-04 0.00053
16 15 15 3.879329e-04 0.00041
17 16 16 2.456908e-04 0.00021
18 17 17 1.556042e-04 0.00014
19 18 18 9.854933e-05 0.00012
20 19 19 6.241457e-05 0.00005
21 20 20 3.952923e-05 0.00004
22 21 21 2.503518e-05 0.00003
23 22 22 1.585561e-05 0.00001
24 28 28 1.023239e-06 0.00001
Where fall the redraws
This is actually a more difficult calculation (or I forgot too much probability). Luckily a bit of brute force comes in handy. To reiterate, in general simulated data shows 0.54 redraws because of the first person etc.
colSums(countstop)/nrep
[1] 0.54276 0.40735 0.31383 0.24834 0.20415
So, what happens in a drawing? The outcomes follow from the matrix redraw. There is 20% chance the first person draws 1, 25% chance the second person draws a 2 etc. Finally, as established, the chance is 36% to have a good draw.
(p.onedraw <- c(colSums(redraw)/nrow(redraw),p.succes) )
[1] 0.20000000 0.15000000 0.11666667 0.09166667 0.07500000 0.36666667
The function below takes these numbers and a locator vector to return a data frame with chances, location of fail and success status in column finish
one.draw <- function(status.now,p.now,p.onedraw) {
la <- lapply(1:(nn+1),function(x) {
status <- status.now
if (x>nn) finish=TRUE
else {
status[x] <- status[x] +1
finish=FALSE}
list(status=status,p=p.onedraw[x],finish=finish)
})
res <- as.data.frame(do.call(rbind, lapply(la,function(x) x$status)))
res$p <- sapply(la,function(x) x$p*p.now)
res$finish <- sapply(la,function(x) x$finish)
res
}
status.now <- rep(0,nn)
names(status.now) <- paste('person',1:5,sep='')
od <- one.draw(status.now,1,p.onedraw)
od
person1 person2 person3 person4 person5 p finish
1 1 0 0 0 0 0.20000000 FALSE
2 0 1 0 0 0 0.15000000 FALSE
3 0 0 1 0 0 0.11666667 FALSE
4 0 0 0 1 0 0.09166667 FALSE
5 0 0 0 0 1 0.07500000 FALSE
6 0 0 0 0 0 0.36666667 TRUE
Where the sequence is not finished, the same chances apply again. For this a second function is build. Same outcomes are combined to restrict the number of outcomes.
one.draw.wrap <- function(x,p.pnedraw) {
if (x$finish) return(x)
one.draw(x[grep(‘person’,names(x))],x$p,p.onedraw)
}
new.draw <- function(od,p.onedraw) {
todo <- od[!od$finish,]
done <- od[od$finish,]
la <- lapply(1:nrow(todo),function(x) one.draw.wrap(todo[x,],p.onedraw))
snd <- do.call(rbind,la)
snd <- snd[do.call(order,snd),]
i <- 1
while(i
if(all(snd[i,!(colnames(snd)==’p’)] == snd[i+1,!(colnames(snd)==’p’)])) {
snd$p[i] <- snd$p[i] + snd$p[i+1]
snd <- snd[-(i+1),]
}
else i=i+1
}
snd <- rbind(snd, done)
snd[order(-snd$p),]
}
head(new.draw(od))
person1 person2 person3 person4 person5 p finish
61 0 0 0 0 0 0.36666667 TRUE
6 1 0 0 0 0 0.07333333 TRUE
2 1 1 0 0 0 0.06000000 FALSE
25 0 1 0 0 0 0.05500000 TRUE
3 1 0 1 0 0 0.04666667 FALSE
35 0 0 1 0 0 0.04277778 TRUE
There are still quite some unfinished drawings, so we can redo that a few times. This is where a fast processor is practical.
myfin <- list(od=od)
for (i in 1:15) myfin[[i+1]] <- new.draw(myfin[[i]])
Even after 15 steps there are some unfinished sequences. Nevertheless I stop here.
Even after 15 steps there are some unfinished sequences. Nevertheless I stop here.
xtabs(p ~finish,data=myfin[[15]])
finish
FALSE TRUE
0.001057999 0.998942001
In the finished series there are in on average 0.54 redraws at person 1. That is pretty close to 0.54 from the simulation.
with(myfin[[15]],sum(person1[finish]*p[finish]))
[1] 0.5398659
However, the 0.1% unfinished sequences would contribute a lot. Even if they were finished at step 15.
with(myfin[[15]],sum(person1*p))
0.5448775
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.