Secret Santa – unfinished business

November 24, 2012
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

Last week I wrote:
This is actually a more difficult calculation (or I forgot too much probability). Luckily a bit of brute force comes in handy. To reiterate, in general simulated data shows 0.54 redraws because of the first person etc.
colSums(countstop)/nrep
[1] 0.54276 0.40735 0.31383 0.24834 0.20415
In this last week I figured out how to finish this more elegantly/without too much brute force.
To reiterate, the chance within a particular round of drawing names are:
(p.onedraw <- c(colSums(redraw)/nrow(redraw),p.succes) )
[1] 0.200 0.150 0.117 0.0917 0.075 0.367
This means that a 'round' of secret santa with 5 persons can have 6 outcomes: The first person draws self, the second draws self, same for person three , four and five. Sixth outcome: a valid drawing of names. If a person self draws, a next round occurs. This means an infinite number of rounds and drawings may occur. The brute force solution was to follow a lot of solutions and do this again and again. But, we can calculate the probability of of a finish after twice a fail at person 1 and three times fail at person 3. It is the chance of this happening: 0.22*0.1173*0.367 times the number of ways this can happen: (2+3)!/(2!*3!). Obviously, this is a generalization of some of the formulas for binomial. 
Now, to do this for all reasonable possibilities of results. The latter may be obtained e.g. by 
ncalc <- 8xx <- expand.grid(0:ncalc,0:ncalc,0:ncalc,0:ncalc,0:ncalc)
nrow(xx)
[1] 59049

Implementation phase 1

There are quite a number of probabilities to calculate, and as this will be wholesale, every shortcut is welcome. The main approach is to do the whole calculation on a logarithmic scale and transform back at the end. 
With a bit of preparation log(factorial) is just getting a number out of a vector.
lnumb <- c(0,log(1:40))
clnumb <- cumsum(lnumb)
lfac <- function(x) clnumb[x+1]
exp(lfac(4))
[1] 24
to store logs for chances: 
lp1d <- log(p.onedraw)
Combine it all into functions
lpoccur <- function(x,lpvals) {
  x <- as.numeric(x)
  crossprod(c(x,1),lpvals) + lfac(sum(x))-sum(sapply(x,lfac))
}
poccur <- function(x,lpvals) exp(lpoccur(x,lpvals))
And call it over the matrix xx
chances <- sapply(1:nrow(xx),function(x) poccur(xx[x,],lp1d))
newcalc.res <- as.data.frame(cbind(xx,chances))
newcalc.res <- newcalc.res[order(-newcalc.res$chances),]
head(newcalc.res)
     Var1 Var2 Var3 Var4 Var5    chances
1       0    0    0    0    0 0.36666667
2       1    0    0    0    0 0.07333333
10      0    1    0    0    0 0.05500000
82      0    0    1    0    0 0.04277778
730     0    0    0    1    0 0.03361111
6562    0    0    0    0    1 0.02750000

Implementation phase 2

The idea was to not store matrix xx, but rather generate the rows on the fly, because I thought xx would be too large. But is seems that is not needed and a suitable xx fits into memory. 
sum(chances)
[1] 0.9998979
Hence the calculation can be vectorised a bit more, reducing calculation time again:
ntrial <- rowSums(xx)
accu <- lfac(ntrial) + crossprod(t(as.matrix(xx)),lp1d[1:(length(lp1d)-1)])
for (i in 1:ncol(xx) ) accu <- accu-lfac(xx[,i])
accu <- accu+lp1d[length(lp1d)]
chance2 <- exp(accu)
newcalc2.res <- cbind(xx,chance2)
newcalc2.res <- newcalc2.res[order(-newcalc2.res$chance2),]
So, the values from simulation are recreated
sum(newcalc2.res$Var1*newcalc2.res$chance2)
[1] 0.5445788
sum(newcalc2.res$Var5*newcalc2.res$chance2)
[1] 0.2044005
Calculation done! In the end it was not so much a difficult or long calculation. However, a simulation is much easy to construct and clearly pointed out directions.

To leave a comment for the author, please follow the link and comment on his blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.