# A Little Sampling Puzzle

June 18, 2011
By

(This article was first published on mickeymousemodels, and kindly contributed to R-bloggers)

Suppose you have 10 objects from which you take a sample of size 20 (with replacement, or you’re in trouble). What’s the probability that each object was chosen at least once? Getting an answer via simulation is pleasantly easy:

`f <- function(n=10, k=20) {  x <- 1:n  x.sample <- sample(x, size=k, replace=TRUE)  return(length(unique(x.sample)) == n)}num.simulations <- 100000table(replicate(num.simulations, f())) / num.simulations`

You should see a number close to 0.215, which is confirmed by the analytic solution:

`g <- function(i) {  ((-1) ^ (i + 1)) * choose(10, i) * ((10 - i) / 10) ^ 20}1 - sum(sapply(1:9, g))`

The second term is the probability that at least one object was not sampled. Enjoy!

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...