Variable probability Bernoulli outcomes – Fast and Slow

November 1, 2012
By

(This article was first published on bayesianbiologist » Rstats, and kindly contributed to R-bloggers)

I am working on a project that requires the generation of Bernoulli outcomes. Typically, I would go about this using the built in sample() function like so:

sample(1:0,n,prob=c(p,1-p),replace=TRUE)

This works great and is fast, even for large n. Problem is, I want to generate each sample with its own unique probability. Seems straight forward enough, I just wrapped the function and vectorized to allow the passing of a vector of p.

binomial_sampler<-function(p){
  return(sample(1:0,1,prob=c(p,1-p)))
}
bs<-Vectorize(binomial_sampler)

Naming this function bs() turned out to be rather prophetic. Nevertheless, I can call this function by passing my unique vector of outcome probabilities. And indeed I get the result I’m looking for.

bs(my_p_vec)

Problem is, this turns out to be very slow. It would seem that there is quite a bit of overhead to calling sample() for one sample at a time. R’s RNGs are very fast for generating many iid samples, so I started thinking like my old c++ programming self and tried a different approach.

Nbs<-function(p)
{
  U<-runif(length(p),0,1)
  outcomes<-U<p
  return(U)
}

I call the new version Nbs for “New Bernoulli Sampler”, or “Not Bull Shit”. And what a difference it made indeed!

library(rbenchmark)
p<-runif(1000)
res <- benchmark(bs(p), Nbs(p))
print(res)
test replications elapsed relative user.self sys.self user.child sys.child
2 Nbs(p)          100   0.007        1     0.008    0.000          0         0
1  bs(p)          100   1.099      157     1.080    0.016          0         0

157x faster! Now that’s a speedup to write home about.

Dan “The Man” Bernoulli


To leave a comment for the author, please follow the link and comment on his blog: bayesianbiologist » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , , , , , , ,

Comments are closed.