Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In native R, the user sets the seed for random number generation (RNG) with set.seed(). Random number generators exist in C and C++ too; these need their own seeds, which are not obviously settable by set.seed(). Good news! It can be done.
pacman::p_load(inline, purrr)
rbernoulli
Base R (or technically the stats package) provides no rbernoulli(). It’s a pretty gaping hole in the pantheon of rbeta(), rbinom(), rcauchy, rchisq(), rexp(), rf(), rgamma(), etc. Thankfully, Hadley Wickham noticed this and gave us purrr::rbernoulli().
set.seed(1) rbernoulli(5, 0.7) #> [1] FALSE TRUE TRUE TRUE FALSE set.seed(1) rbernoulli(5, 0.7) #> [1] FALSE TRUE TRUE TRUE FALSE
So it seems like Hadley managed to get set.seed() to work with rbernoulli(). How did he do this? Let’s take a closer look at purrr::rbernoulli().
purrr::rbernoulli
#> function (n, p = 0.5) 
#> {
#>     stats::runif(n) > (1 - p)
#> }
#> <bytecode: 0x7fb23720c540>
#> <environment: namespace:purrr>
Ah, it seems Hadley just wrapped runif(); hence, because set.seed() works with runif(), it works with his implementation of purrr::rbernoulli().
C++ RNG
The C++ standard library provides the <random> header file, which includes Bernoulli RNG. Let’s give that a whirl.
cpp_rbernoulli <- rcpp(c(n = "integer", p = "numeric", seed = "integer"), '
                       int n_ = as<int>(n), seed_ = as<int>(seed);
                       double p_ = as<double>(p);
                       std::default_random_engine generator(seed_);
                       std::bernoulli_distribution distribution(p_);
                       IntegerVector out(n_);
                       for (std::size_t i = 0; i != n_; ++i) {
                         out[i] = distribution(generator);
                       }
                       return out;
                       ', includes = "#include <random>")
cpp_rbernoulli(6, 0.7, seed = 1)
#> [1] 1 0 1 1 0 1
cpp_rbernoulli(6, 0.7, seed = 1)
#> [1] 1 0 1 1 0 1
cpp_rbernoulli(6, 0.7, seed = 2)
#> [1] 1 0 1 0 1 1
OK, so now we have cpp_rbernoulli() which is working, but the user has to pass the seed as an argument of the function, there’s no option to use set.seed().
get_seed()
If only there was a get_seed() function that we could use. Well, here it is!
get_seed <- function() {
  sample.int(.Machine$integer.max, 1)
}
This gets a positive number in the unsigned 32-bit integer range (which is always a safe bet for a seed) and it is completely determined by set.seed(). Therefore, it’s fine to use as a seed itself. Let’s take a look.
set.seed(1) replicate(6, get_seed()) #> [1] 570175513 799129990 1230193230 1950361378 433108649 1929277158 set.seed(1) replicate(6, get_seed()) #> [1] 570175513 799129990 1230193230 1950361378 433108649 1929277158 set.seed(2) replicate(6, get_seed()) #> [1] 397031630 1508336757 1231208929 360888751 2026879546 2026097046 set.seed(2) replicate(6, get_seed()) #> [1] 397031630 1508336757 1231208929 360888751 2026879546 2026097046
So as we can see, setting a seed via set.seed() determines the seeds that subsequently come out of get_seed(), so all is well with the world. get_seed() can now be used to create a version of cpp_rbernoulli() which uses set.seed(). For the sake of inflating my own ego, I’ll name this version after myself.
rorybernoulli <- function(n, p) {
  cpp_rbernoulli(n, p, get_seed())
}
Let’s check that it’s in working order.
set.seed(1) rorybernoulli(6, 0.7) #> [1] 1 1 0 0 1 0 set.seed(1) rorybernoulli(6, 0.7) #> [1] 1 1 0 0 1 0 set.seed(2) rorybernoulli(6, 0.7) #> [1] 0 1 1 1 1 1 set.seed(2) rorybernoulli(6, 0.7) #> [1] 0 1 1 1 1 1
Everything is awesome.
Benchmarking
Lastly, let’s compare the two Bernoulli RNGs that we have now by asking them both to give us a million Bernoulli random numbers with p = 0.5.
bench::mark(purrr::rbernoulli(1e6, p = 0.5),
            rorybernoulli(1e6, p = 0.5),
            check = FALSE)
#> # A tibble: 2 x 10
#>   expression     min    mean  median    max `itr/sec` mem_alloc  n_gc n_itr
#>   <chr>      <bch:t> <bch:t> <bch:t> <bch:>     <dbl> <bch:byt> <dbl> <int>
#> 1 purrr::rb…  26.9ms 29.45ms 28.57ms 32.7ms      34.0   11.45MB     4    13
#> 2 roryberno…  8.55ms  9.61ms  9.38ms 12.7ms     104.     3.82MB     4    46
#> # ... with 1 more variable: total_time <bch:tm>
Wow, rorybernoulli() is three times faster! I wasn’t expecting that. Perhaps it’s because there’s a quicker way of generating a Bernoulli random number than by going through a uniform random number (as purrr::rbernoulli() does). It’s also three times as efficient with memory, probably related to the time speedup. The point of me writing this post was to share this get_seed() thing with people so that the can use set.seed() with Rcpp and the like; purrr::rbernoulli() was just a cool example of a non-base RNG that popped into my head. Maybe I should submit a pull request to purrr!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
