Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

On reddit somebody asked For n individuals, what’s the probability that the last person to pick during a round of Secret Santa name picking, will pick their own name.. “With each person picking in turn, and re-picking if they pick out their own name.‘ Two simulations were proposed. After some thinking I think it can be calculated rather easily.

### Problem consideration

I think, it is possible to estimate the chance five people together draw any sequence. It is also possible to enumerate all sequences. Categorize the sequences, add their chances. Problem solved.

So, for example the sequence 2 5 1 3 4.
The first person has has 1/4 chance to select #2, as 2, 3, 4 and 5 can be selected with equal chances.
The second person has 1/4 chance to select #5, as 1, 3, 4 and 5 can be selected with equal chances.
The third person has 1/2 chance to select #1, as 1 or 4 can be selected.
The fourth person selects 3, as there is no alternative (4 results in a repick).
The fifth person selects 4, being the remaining number.
Chance to get this sequence: 1/32.
It is possible to recurse through all these trees, but why? After all we can easily enumerate all these sequences.
nn <-5
pp <- randtoolbox::permut(nn)
for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
i i i
[1,] 5 4 1 3 2
[2,] 5 4 2 3 1
[3,] 5 4 1 2 3
[4,] 5 4 2 1 3
[5,] 5 3 4 1 2
[6,] 5 3 4 2 1
Then, the chances. For each column it is just 1/#possible_selections. That is one(!) statement.
la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
str(la)

List of 4
\$ : num [1:53] 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 …
\$ : num [1:53] 0.333 0.333 0.333 0.333 0.333 …
\$ : num [1:53] 0.5 0.5 0.5 0.5 0.333 …
\$ : num [1:53] 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 1 …
Having these chances, they must be multiplied
prob <- Reduce('*',la)
sum(prob)
[1] 1
Luckily they add to 1. So, what is the chance the last person draws a 5? 0.13, exactly the same as a simulation number found on reddit.
sum(prob[pp[,nn]==nn])
[1] 0.1319444

To get a bit more results, we can put it in a function and get the answers up to 9 persons:

secsan1 <- function(nn) {
pp <- randtoolbox::permut(nn)
for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
prob <- Reduce('*',la)
sum(prob[pp[,nn]==nn])
}
data.frame(n=3:9,sapply(3:9,secsan1))
n sapply.3.9..secsan1.
1 3           0.25000000
2 4           0.13888889
3 5           0.13194444
4 6           0.11277778
5 7           0.10053241
6 8           0.09049461
7 9           0.08238237
These numbers are close to one of the simulations.

The second simulation shows the chances the last person has to draw a person.
secsan2 <- function(nn) {
pp <- randtoolbox::permut(nn)
for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
prob <- Reduce('*',la)
xtabs(prob ~ pp[,nn])
}
as.data.frame(secsan2(8))
pp…nn.       Freq
1        1 0.10485639
2        2 0.10729167
3        3 0.11072279
4        4 0.11591789
5        5 0.12471088
6        6 0.14283588
7        7 0.20316988
8        8 0.09049461
These numbers are not equal to simulation numbers on Reddit. This is because this simulation contains an error. It reads:
```selsan <- function(who,persons) {
if (length(persons)==1) return(persons)
sel <- sample(persons[persons!=who],1)
return(c(sel,selsan(who+1,persons[persons!=sel])))
}
#selsan(1,1:5)
finselsan <- function(n){
selsan(1,1:n)[n]
}
nrep=1e4
sa <- sapply(1:nrep,function(x) finselsan(8))
table(sa)/nrep```
What happens is that sample() is pulling a trick on us. Most of the time sample() gets some values and selects one of those. It is possible however, that the second last person has also himself to draw. At which point sample thinks only one number is available. To quote the manual: If `x` has length 1, is numeric (in the sense of `is.numeric`) and `x >= 1`, sampling via `sample` takes place from `1:x`. Note that this convenience feature may lead to undesired behaviour when `x` is of varying length in calls such as `sample(x)`. See the examples. It is even documented. But who reads these all the time? A corrected simulation avoids this:
selsan <- function(who,persons) {

if (length(persons)==1) return(persons)
if (length(persons[persons!=who])==1) return(c(persons[persons!=who],who))
sel <- sample(persons[persons!=who],1)
return(c(sel,selsan(who+1,persons[persons!=sel])))
}
finselsan <- function(n){
selsan(1,1:n)[n]
}
nrep=1e4
sa <- sapply(1:nrep,function(x) finselsan(8))
as.data.frame(table(sa)/nrep)
sa   Freq
1  1 0.1033
2  2 0.1024
3  3 0.1108
4  4 0.1123
5  5 0.1270
6  6 0.1498
7  7 0.2024
8  8 0.0920
This shows the exact calculation gives similar results as two simulations.

#### Extensions

What are the chances for any person to get anybody? It is simple now.
nn <- 6
pp <- randtoolbox::permut(nn)
for(i in 1:(nn-1)) pp<- pp[!(pp[,i]==i),]
la <- lapply(1:(nn-1),function(x) 1/rowSums(pp[,x:nn]!=x))
prob <- Reduce('*',la)
sapply(1:nn,function(x) xtabs(prob ~ factor(pp[,x],levels=1:nn)))
[,1] [,2]      [,3]      [,4]      [,5]      [,6]
1  0.0 0.24 0.2250000 0.2083333 0.1875000 0.1391667
2  0.2 0.00 0.2375000 0.2194444 0.1972222 0.1458333
3  0.2 0.19 0.0000000 0.2388889 0.2138889 0.1572222
4  0.2 0.19 0.1791667 0.0000000 0.2500000 0.1808333
5  0.2 0.19 0.1791667 0.1666667 0.0000000 0.2641667
6  0.2 0.19 0.1791667 0.1666667 0.1513889 0.1127778
The persons are columns, the rows are presents. Obviously 0 on the diagonal, except the last person. Closer to the last person, the chances get more unequal.

#### Conclusion

Don't do this. Get everybody to draw, then determine if a redraw is needed.