Here you will find daily news and tutorials about R, contributed by over 750 bloggers.
There are many ways to follow us - By e-mail:On Facebook: If you are an R blogger yourself you are invited to add your own R content feed to this site (Non-English R bloggers should add themselves- here)

While the last riddle on The Riddler was rather anticlimactic, namely to find the mean of the number Y of empty bins in a uniform multinomial with n bins and m draws, with solution

[which still has a link with e in that the fraction of empty bins converges to 1-e⁻¹ when n=m], this led me to some more involved investigation on the distribution of Y. While it can be shown directly that the probability that k bins are non-empty is

with an R representation by

miss<-function(n,m){
p=rep(0,n)
for (k in 1:n)
p[k]=choose(n,k)*sum((-1)^((k-1):0)*choose(k,1:k)*(1:k)^m)
return(rev(p)/n^m)}

I wanted to take advantage of the moments of Y, since it writes as a sum of n indicators, counting the number of empty cells. However, the higher moments of Y are not as straightforward as its expectation and I struggled with the representation until I came upon this formula

where S(k,i) denotes the Stirling number of the second kind… Or i!S(n,i) is the number of surjections from a set of size n to a set of size i. Which leads to the distribution of Y by inverting the moment equations, as in the following R code:

diss<-function(n,m){
A=matrix(0,n,n)
mome=rep(0,n)
A[n,]=rep(1,n)
mome[n]=1
for (k in 1:(n-1)){
A[k,]=(0:(n-1))^k
for (i in 1:k)
mome[k]=mome[k]+factorial(i)*as.integer(Stirling2(n,i))*
(1-(i+1)/n)^m*factorial(k)/factorial(k-i-1)}
return(solve(A,mome))}

that I still checked by raw simulations from the multinomial