occupancy rules

[This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

While the last riddle on The Riddler was rather anticlimactic, namely to find the mean of the number Y of empty bins in a uniform multinomial with n bins and m draws, with solution

\mathbb{E}[Y]=n(1-\frac{1}{n})^m,

[which still has a link with e in that the fraction of empty bins converges to 1-e⁻¹ when n=m], this led me to some more involved investigation on the distribution of Y. While it can be shown directly that the probability that k bins are non-empty is

{n \choose k}\sum_{i=1}^k (-1)^{k-i}{k \choose i}(i/n)^m

with an R representation by

miss<-function(n,m){
p=rep(0,n)
for (k in 1:n)
 p[k]=choose(n,k)*sum((-1)^((k-1):0)*choose(k,1:k)*(1:k)^m)
return(rev(p)/n^m)}

I wanted to take advantage of the moments of Y, since it writes as a sum of n indicators, counting the number of empty cells. However, the higher moments of Y are not as straightforward as its expectation and I struggled with the representation until I came upon this formula

\mathbb{E}[Y^k]=\sum_{i=1}^k {k \choose i} i! S(k,i) \left( 1-\frac{i}{n}\right)^m

where S(k,i) denotes the Stirling number of the second kind… Or i!S(n,i) is the number of surjections from a set of size n to a set of size i. Which leads to the distribution of Y by inverting the moment equations, as in the following R code:

diss<-function(n,m){
  A=matrix(0,n,n)
  mome=rep(0,n)
  A[n,]=rep(1,n)
  mome[n]=1
  for (k in 1:(n-1)){
   A[k,]=(0:(n-1))^k
   for (i in 1:k)
     mome[k]=mome[k]+factorial(i)*as.integer(Stirling2(n,i))*
     (1-(i+1)/n)^m*factorial(k)/factorial(k-i-1)}
  return(solve(A,mome))}

that I still checked by raw simulations from the multinomial

zample<-function(n,m,T=1e4){
  x=matrix(sample(1:n,m*T,rep=TRUE),nrow=T)
  x=sapply(apply(x,1,unique),length)
  return(n-x)}

Filed under: Kids, R, Statistics Tagged: moment derivation, moments, multinomial distribution, occupancy, R, Stack Exchange, Stirling number, surjection

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)