I am working on a project that requires the generation of Bernoulli outcomes. Typically, I would go about this using the built in sample() function like so:
sample(1:0,n,prob=c(p,1-p),replace=TRUE)
This works great and is fast, even for large n. Problem is, I want to generate each sample with its own unique probability. Seems straight forward enough, I just wrapped the function and vectorized to allow the passing of a vector of p.
binomial_sampler<-function(p){
return(sample(1:0,1,prob=c(p,1-p)))
}
bs<-Vectorize(binomial_sampler)
Naming this function bs() turned out to be rather prophetic. Nevertheless, I can call this function by passing my unique vector of outcome probabilities. And indeed I get the result I’m looking for.
bs(my_p_vec)
Problem is, this turns out to be very slow. It would seem that there is quite a bit of overhead to calling sample() for one sample at a time. R’s RNGs are very fast for generating many iid samples, so I started thinking like my old c++ programming self and tried a different approach.
Nbs<-function(p)
{
U<-runif(length(p),0,1)
outcomes<-U<p
return(U)
}
I call the new version Nbs for “New Bernoulli Sampler”, or “Not Bull Shit”. And what a difference it made indeed!
library(rbenchmark) p<-runif(1000) res <- benchmark(bs(p), Nbs(p)) print(res) test replications elapsed relative user.self sys.self user.child sys.child 2 Nbs(p) 100 0.007 1 0.008 0.000 0 0 1 bs(p) 100 1.099 157 1.080 0.016 0 0
157x faster! Now that’s a speedup to write home about.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...


Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).