(This article was first published on

**Wiekvoet**, and kindly contributed to R-bloggers)Last week I wrote:*This is actually a more difficult calculation (or I forgot too much probability). Luckily a bit of brute force comes in handy. To reiterate, in general simulated data shows 0.54 redraws because of the first person etc.*

*colSums(countstop)/nrep*

*[1] 0.54276 0.40735 0.31383 0.24834 0.20415*

In this last week I figured out how to finish this more elegantly/without too much brute force.

To reiterate, the chance within a particular round of drawing names are:

*(p.onedraw <- c(colSums(redraw)/nrow(redraw),p.succes) )*

*[1] 0.200 0.150 0.117 0.0917 0.075 0.367*

This means that a ’round’ of secret santa with 5 persons can have 6 outcomes: The first person draws self, the second draws self, same for person three , four and five. Sixth outcome: a valid drawing of names. If a person self draws, a next round occurs. This means an infinite number of rounds and drawings may occur. The brute force solution was to follow a lot of solutions and do this again and again. But, we can calculate the probability of of a finish after twice a fail at person 1 and three times fail at person 3. It is the chance of this happening: 0.2

^{2}*0.117^{3}*0.367 times the number of ways this can happen: (2+3)!/(2!*3!). Obviously, this is a generalization of some of the formulas for binomial.Now, to do this for all reasonable possibilities of results. The latter may be obtained e.g. by

ncalc <- 8xx <- expand.grid(0:ncalc,0:ncalc,0:ncalc,0:ncalc,0:ncalc)

nrow(xx)

[1] 59049

#### Implementation phase 1

There are quite a number of probabilities to calculate, and as this will be wholesale, every shortcut is welcome. The main approach is to do the whole calculation on a logarithmic scale and transform back at the end.

With a bit of preparation log(factorial) is just getting a number out of a vector.

lnumb <- c(0,log(1:40))

clnumb <- cumsum(lnumb)

lfac <- function(x) clnumb[x+1]

exp(lfac(4))

[1] 24

to store logs for chances:

lp1d <- log(p.onedraw)

Combine it all into functions

lpoccur <- function(x,lpvals) {

x <- as.numeric(x)

crossprod(c(x,1),lpvals) + lfac(sum(x))-sum(sapply(x,lfac))

}

poccur <- function(x,lpvals) exp(lpoccur(x,lpvals))

And call it over the matrix xx

chances <- sapply(1:nrow(xx),function(x) poccur(xx[x,],lp1d))

newcalc.res <- as.data.frame(cbind(xx,chances))

newcalc.res <- newcalc.res[order(-newcalc.res$chances),]

head(newcalc.res)

Var1 Var2 Var3 Var4 Var5 chances

1 0 0 0 0 0 0.36666667

2 1 0 0 0 0 0.07333333

10 0 1 0 0 0 0.05500000

82 0 0 1 0 0 0.04277778

730 0 0 0 1 0 0.03361111

6562 0 0 0 0 1 0.02750000

#### Implementation phase 2

The idea was to

**not**store matrix xx, but rather generate the rows on the fly, because I thought xx would be too large. But is seems that is not needed and a suitable xx fits into memory.sum(chances)

[1] 0.9998979

Hence the calculation can be vectorised a bit more, reducing calculation time again:

ntrial <- rowSums(xx)

accu <- lfac(ntrial) + crossprod(t(as.matrix(xx)),lp1d[1:(length(lp1d)-1)])

for (i in 1:ncol(xx) ) accu <- accu-lfac(xx[,i])

accu <- accu+lp1d[length(lp1d)]

chance2 <- exp(accu)

newcalc2.res <- cbind(xx,chance2)

newcalc2.res <- newcalc2.res[order(-newcalc2.res$chance2),]

So, the values from simulation are recreated

sum(newcalc2.res$Var1*newcalc2.res$chance2)

[1] 0.5445788

sum(newcalc2.res$Var5*newcalc2.res$chance2)

[1] 0.2044005

Calculation done! In the end it was not so much a difficult or long calculation. However, a simulation is much easy to construct and clearly pointed out directions.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Wiekvoet**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...