# Using R’s set.seed() to set seeds for use in C/C++ (including Rcpp)

**R on roryverse**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In native R, the user sets the seed for random number generation (RNG) with `set.seed()`

. Random number generators exist in C and C++ too; these need their own seeds, which are not obviously settable by `set.seed()`

. Good news! It can be done.

pacman::p_load(inline, purrr)

`rbernoulli`

Base R (or technically the `stats`

package) provides no `rbernoulli()`

. It’s a pretty gaping hole in the pantheon of `rbeta()`

, `rbinom()`

, `rcauchy`

, `rchisq()`

, `rexp()`

, `rf()`

, `rgamma()`

, etc. Thankfully, Hadley Wickham noticed this and gave us `purrr::rbernoulli()`

.

set.seed(1) rbernoulli(5, 0.7) #> [1] FALSE TRUE TRUE TRUE FALSE set.seed(1) rbernoulli(5, 0.7) #> [1] FALSE TRUE TRUE TRUE FALSE

So it seems like Hadley managed to get `set.seed()`

to work with `rbernoulli()`

. How did he do this? Let’s take a closer look at `purrr::rbernoulli()`

.

purrr::rbernoulli #> function (n, p = 0.5) #> { #> stats::runif(n) > (1 - p) #> } #> <bytecode: 0x7fb23720c540> #> <environment: namespace:purrr>

Ah, it seems Hadley just wrapped `runif()`

; hence, because `set.seed()`

works with `runif()`

, it works with his implementation of `purrr::rbernoulli()`

.

# C++ RNG

The C++ standard library provides the

header file, which includes Bernoulli RNG. Let’s give that a whirl.

cpp_rbernoulli <- rcpp(c(n = "integer", p = "numeric", seed = "integer"), ' int n_ = as<int>(n), seed_ = as<int>(seed); double p_ = as<double>(p); std::default_random_engine generator(seed_); std::bernoulli_distribution distribution(p_); IntegerVector out(n_); for (std::size_t i = 0; i != n_; ++i) { out[i] = distribution(generator); } return out; ', includes = "#include <random>")

cpp_rbernoulli(6, 0.7, seed = 1) #> [1] 1 0 1 1 0 1 cpp_rbernoulli(6, 0.7, seed = 1) #> [1] 1 0 1 1 0 1 cpp_rbernoulli(6, 0.7, seed = 2) #> [1] 1 0 1 0 1 1

OK, so now we have `cpp_rbernoulli()`

which is working, but the user has to pass the seed as an argument of the function, there’s no option to use `set.seed()`

.

`get_seed()`

If only there was a `get_seed()`

function that we could use. Well, here it is!

get_seed <- function() { sample.int(.Machine$integer.max, 1) }

This gets a positive number in the unsigned 32-bit integer range (which is always a safe bet for a seed) and it is completely determined by `set.seed()`

. Therefore, it’s fine to use as a seed itself. Let’s take a look.

set.seed(1) replicate(6, get_seed()) #> [1] 570175513 799129990 1230193230 1950361378 433108649 1929277158 set.seed(1) replicate(6, get_seed()) #> [1] 570175513 799129990 1230193230 1950361378 433108649 1929277158 set.seed(2) replicate(6, get_seed()) #> [1] 397031630 1508336757 1231208929 360888751 2026879546 2026097046 set.seed(2) replicate(6, get_seed()) #> [1] 397031630 1508336757 1231208929 360888751 2026879546 2026097046

So as we can see, setting a seed via `set.seed()`

determines the seeds that subsequently come out of `get_seed()`

, so all is well with the world. `get_seed()`

can now be used to create a version of `cpp_rbernoulli()`

which uses `set.seed()`

. For the sake of inflating my own ego, I’ll name this version after myself.

rorybernoulli <- function(n, p) { cpp_rbernoulli(n, p, get_seed()) }

Let’s check that it’s in working order.

set.seed(1) rorybernoulli(6, 0.7) #> [1] 1 1 0 0 1 0 set.seed(1) rorybernoulli(6, 0.7) #> [1] 1 1 0 0 1 0 set.seed(2) rorybernoulli(6, 0.7) #> [1] 0 1 1 1 1 1 set.seed(2) rorybernoulli(6, 0.7) #> [1] 0 1 1 1 1 1

Everything is awesome.

# Benchmarking

Lastly, let’s compare the two Bernoulli RNGs that we have now by asking them both to give us a million Bernoulli random numbers with `p = 0.5`

.

bench::mark(purrr::rbernoulli(1e6, p = 0.5), rorybernoulli(1e6, p = 0.5), check = FALSE) #> # A tibble: 2 x 10 #> expression min mean median max `itr/sec` mem_alloc n_gc n_itr #> <chr> <bch:t> <bch:t> <bch:t> <bch:> <dbl> <bch:byt> <dbl> <int> #> 1 purrr::rb… 26.9ms 29.45ms 28.57ms 32.7ms 34.0 11.45MB 4 13 #> 2 roryberno… 8.55ms 9.61ms 9.38ms 12.7ms 104. 3.82MB 4 46 #> # ... with 1 more variable: total_time <bch:tm>

Wow, `rorybernoulli()`

is three times faster! I wasn’t expecting that. Perhaps it’s because there’s a quicker way of generating a Bernoulli random number than by going through a uniform random number (as `purrr::rbernoulli()`

does). It’s also three times as efficient with memory, probably related to the time speedup. The point of me writing this post was to share this `get_seed()`

thing with people so that the can use `set.seed()`

with `Rcpp`

and the like; `purrr::rbernoulli()`

was just a cool example of a non-base RNG that popped into my head. Maybe I should submit a pull request to `purrr`

!

**leave a comment**for the author, please follow the link and comment on their blog:

**R on roryverse**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.