Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In native R, the user sets the seed for random number generation (RNG) with `set.seed()`. Random number generators exist in C and C++ too; these need their own seeds, which are not obviously settable by `set.seed()`. Good news! It can be done.

``pacman::p_load(inline, purrr)``

# `rbernoulli`

Base R (or technically the `stats` package) provides no `rbernoulli()`. It’s a pretty gaping hole in the pantheon of `rbeta()`, `rbinom()`, `rcauchy`, `rchisq()`, `rexp()`, `rf()`, `rgamma()`, etc. Thankfully, Hadley Wickham noticed this and gave us `purrr::rbernoulli()`.

``````set.seed(1)
rbernoulli(5, 0.7)
#>  FALSE  TRUE  TRUE  TRUE FALSE
set.seed(1)
rbernoulli(5, 0.7)
#>  FALSE  TRUE  TRUE  TRUE FALSE``````

So it seems like Hadley managed to get `set.seed()` to work with `rbernoulli()`. How did he do this? Let’s take a closer look at `purrr::rbernoulli()`.

``````purrr::rbernoulli
#> function (n, p = 0.5)
#> {
#>     stats::runif(n) > (1 - p)
#> }
#>
#> ``````

Ah, it seems Hadley just wrapped `runif()`; hence, because `set.seed()` works with `runif()`, it works with his implementation of `purrr::rbernoulli()`.

# C++ RNG

The C++ standard library provides the header file, which includes Bernoulli RNG. Let’s give that a whirl.

``````cpp_rbernoulli <- rcpp(c(n = "integer", p = "numeric", seed = "integer"), '
int n_ = as(n), seed_ = as(seed);
double p_ = as(p);
std::default_random_engine generator(seed_);
std::bernoulli_distribution distribution(p_);
IntegerVector out(n_);
for (std::size_t i = 0; i != n_; ++i) {
out[i] = distribution(generator);
}
return out;
', includes = "#include ")``````
``````cpp_rbernoulli(6, 0.7, seed = 1)
#>  1 0 1 1 0 1
cpp_rbernoulli(6, 0.7, seed = 1)
#>  1 0 1 1 0 1
cpp_rbernoulli(6, 0.7, seed = 2)
#>  1 0 1 0 1 1``````

OK, so now we have `cpp_rbernoulli()` which is working, but the user has to pass the seed as an argument of the function, there’s no option to use `set.seed()`.

# `get_seed()`

If only there was a `get_seed()` function that we could use. Well, here it is!

``````get_seed <- function() {
sample.int(.Machine\$integer.max, 1)
}``````

This gets a positive number in the unsigned 32-bit integer range (which is always a safe bet for a seed) and it is completely determined by `set.seed()`. Therefore, it’s fine to use as a seed itself. Let’s take a look.

``````set.seed(1)
replicate(6, get_seed())
#>   570175513  799129990 1230193230 1950361378  433108649 1929277158
set.seed(1)
replicate(6, get_seed())
#>   570175513  799129990 1230193230 1950361378  433108649 1929277158
set.seed(2)
replicate(6, get_seed())
#>   397031630 1508336757 1231208929  360888751 2026879546 2026097046
set.seed(2)
replicate(6, get_seed())
#>   397031630 1508336757 1231208929  360888751 2026879546 2026097046``````

So as we can see, setting a seed via `set.seed()` determines the seeds that subsequently come out of `get_seed()`, so all is well with the world. `get_seed()` can now be used to create a version of `cpp_rbernoulli()` which uses `set.seed()`. For the sake of inflating my own ego, I’ll name this version after myself.

``````rorybernoulli <- function(n, p) {
cpp_rbernoulli(n, p, get_seed())
}``````

Let’s check that it’s in working order.

``````set.seed(1)
rorybernoulli(6, 0.7)
#>  1 1 0 0 1 0
set.seed(1)
rorybernoulli(6, 0.7)
#>  1 1 0 0 1 0
set.seed(2)
rorybernoulli(6, 0.7)
#>  0 1 1 1 1 1
set.seed(2)
rorybernoulli(6, 0.7)
#>  0 1 1 1 1 1``````

Everything is awesome.

# Benchmarking

Lastly, let’s compare the two Bernoulli RNGs that we have now by asking them both to give us a million Bernoulli random numbers with `p = 0.5`.

``````bench::mark(purrr::rbernoulli(1e6, p = 0.5),
rorybernoulli(1e6, p = 0.5),
check = FALSE)
#> # A tibble: 2 x 10
#>   expression     min    mean  median    max `itr/sec` mem_alloc  n_gc n_itr
#>
#> 1 purrr::rb…  26.9ms 29.45ms 28.57ms 32.7ms      34.0   11.45MB     4    13
#> 2 roryberno…  8.55ms  9.61ms  9.38ms 12.7ms     104.     3.82MB     4    46
#> # ... with 1 more variable: total_time ``````

Wow, `rorybernoulli()` is three times faster! I wasn’t expecting that. Perhaps it’s because there’s a quicker way of generating a Bernoulli random number than by going through a uniform random number (as `purrr::rbernoulli()` does). It’s also three times as efficient with memory, probably related to the time speedup. The point of me writing this post was to share this `get_seed()` thing with people so that the can use `set.seed()` with `Rcpp` and the like; `purrr::rbernoulli()` was just a cool example of a non-base RNG that popped into my head. Maybe I should submit a pull request to `purrr`!