# An accept-reject sampler using RcppArmadillo::sample()

**Rcpp Gallery**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The recently added `RcppArmadillo::sample()`

functionality provides the same algorithm used in R’s `sample()`

to Rcpp-level code. Because R’s own `sample()`

is written in C with minimal work done in R, writing a wrapper around `RcppArmadillo::sample()`

to then call in R won’t get you much of a performance boost. However, if you need to repeatedly call `sample()`

, then calling a single function which performs everything in Rcpp-land (including multiple calls to `sample()`

) before returning to R can produce a noticeable speedup over a purely R-based solution.

### Accept-Reject Sampler Example

One place where this situation arises is in an accept-reject sampler where the candidate “draw” is the output of a call to `sample()`

. Concretely, let’s suppose we want to sample 20 integers (without replacement) from 1 to 50 such that the sum of the 20 integers is less than 400. Far fewer than 10% of randomly drawn samples will meet this constraint.

require(RcppArmadillo) Loading required package: RcppArmadillo Loading required package: Rcpp require(rbenchmark) Loading required package: rbenchmark

The R code is straightforward enough. It has been written to mirror the logic of the C++ code, although that doesn’t come at the cost of much performance.

r_getInts <- function(samples) { thresh <- 400 results <- matrix(0, 20, samples) ; cnt <- 0 while(cnt < samples) { candidate = sample(1:50, 20) if (sum(candidate) < thresh) { results[, cnt + 1] <- candidate cnt <- cnt + 1 } } return(results) }

Although it is a bit longer, the logic of the C++ code is similar.

#include// [[Rcpp::depends(RcppArmadillo)]] using namespace Rcpp ; // [[Rcpp::export]] IntegerMatrix cpp_getInts(int samples ) { RNGScope scope; int cnt = 0 ; IntegerMatrix results(20, samples) ; IntegerVector frame = seq_len(50) ; IntegerVector candidate(20) ; int thresh = 400 ; while (cnt < samples) { candidate = RcppArmadillo::sample(frame, 20, FALSE, NumericVector::create() ) ; double sum = std::accumulate(candidate.begin(), candidate.end(), 0.0) ; if (sum < thresh) { results(_, cnt) = candidate ; cnt++ ; } } return results ; }

### Performance

The Rcpp code tends to be about 7-9 times faster and this boost increases as the constraint becomes more complicated (and necessarily more costly in R).

benchmark(r = {set.seed(1); r_getInts(50)}, cpp = {set.seed(1); cpp_getInts(50)}, replications = 10, order = 'relative', columns = c("test", "replications", "relative", "elapsed") ) test replications relative elapsed 2 cpp 10 1.00 0.036 1 r 10 11.97 0.431

### In the Real World …

Where might the structure in this problem arise in practice? One set of instances are those where “space” matters:

- sampling US cities such that no more than two are in any one state
- sampling cellphone towers such that no two are closer than
*X*miles apart - sampling nodes in a graph/network such that no one has more than
*K*edges

In these situations, R code to check the acceptance condition will likely be less efficient relative to the corresponding C++ code and so even larger speed-ups are realized.

**leave a comment**for the author, please follow the link and comment on their blog:

**Rcpp Gallery**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.