I am working on a project that requires the generation of Bernoulli outcomes. Typically, I would go about this using the built in sample() function like so:

sample(1:0,n,prob=c(p,1-p),replace=TRUE)

This works great and is fast, even for large n. Problem is, I want to generate each sample with its own unique probability. Seems straight forward enough, I just wrapped the function and vectorized to allow the passing of a vector of *p*.

binomial_sampler<-function(p){
return(sample(1:0,1,prob=c(p,1-p)))
}
bs<-Vectorize(binomial_sampler)

Naming this function *bs()* turned out to be rather prophetic. Nevertheless, I can call this function by passing my unique vector of outcome probabilities. And indeed I get the result I’m looking for.

bs(my_p_vec)

Problem is, this turns out to be very slow. It would seem that there is quite a bit of overhead to calling sample() for one sample at a time. R’s RNGs are very fast for generating many iid samples, so I started thinking like my old c++ programming self and tried a different approach.

Nbs<-function(p)
{
U<-runif(length(p),0,1)
outcomes<-U<p
return(U)
}

I call the new version Nbs for “New Bernoulli Sampler”, or “Not Bull Shit”. And what a difference it made indeed!

library(rbenchmark)
p<-runif(1000)
res <- benchmark(bs(p), Nbs(p))
print(res)
test replications elapsed relative user.self sys.self user.child sys.child
2 Nbs(p) 100 0.007 1 0.008 0.000 0 0
1 bs(p) 100 1.099 157 1.080 0.016 0 0

157x faster! Now that’s a speedup to write home about.

Dan “The Man” Bernoulli

*Related*

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as: visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...

**Tags:** bernoulli trials, C, code optimization, computer, iid, Probability, rbenchmark, RNG, rstats, sampling, Simulation, Software, Uncategorized, uniform random number