sampling by exhaustion

November 24, 2016
By

(This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers)

The riddle set by The Riddler of last week sums up as follows:

Within a population of size N, each individual in the population independently selects another individual. All individuals selected at least once are removed and the process iterates until one or zero individual is left. What is the probability that there is zero individual left?

While I cannot see a clean analytical solution to this problem, it reminds me of an enveloppe-versus-letter (matches) problem I saw in graduate school. Indeed, the expected number of removed (or selected) individuals is given by

N\left\{1-\frac{N-2}{N-1}\right\}^{N-1}

which is equivalent to (1-e⁻¹)N for N large, meaning that the population decreases by an average factor of e⁻¹ at each round. And that it takes on average approximately log(N) iterations to reach a single individual. A simulation of the first probabilities of ending with one individual led me to the above curve, which wiggles in an almost periodic way around the probability ½, equal to the average of those probabilities. Using the R code

rad=function(N){#next population size
  ut=sample(rep(2:N,2),1)
  for (i in 2:N)#sampling
   ut=c(ut,sample(rep((1:N)[-i],2),1))
  return(N-length(unique(ut))}
sal=rep(0,m);sal[1]=1
for (N in 3:M){
 prop=0;
 for (t in 1:T){#one single step
   i=rad(N)
   if (i>0) prop=prop+sal[i]}
 sal[N]=prop/T}

which exploits the previously computed probabilities. The variability is most interesting if unexpected, but looking back at Feller‘s sections and exercises on the classical occupancy problem, I could not find a connection with this problem. If it exists. Still, if N is large enough, the exclusion of one index from the selection becomes negligible and the probability of moving from n to m individuals should be approximately [Feller, eqn (2.4), p.102]

p_n(m)={n\choose m}\sum_{v=}^{n-m} (-1)^v {n-m\choose v} \left(1-\frac{m+v}{n}\right)^n

This formula approximates quite well the exact probability, but as in a previous post about the birthday problem, it proves quite delicate to compute. As already noticed by Feller.

Filed under: Books, Kids, R, Statistics Tagged: birthday problem, occupancy, sampling, The Riddler, William Feller

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)