Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Sometimes one needs to mimic the exact behavior of R’s Distributions within C++
code. The incredible Rcpp team has provided access to these distributions through
Rmath.h (in the R:: namespace), as well as through the Rcpp:: namespace where there
can be two forms: scalar as in R, and vectorized via Rcpp sugar. The behavior of these
functions may not always exactly match what the user expects from the standard R behavior,
particularly if attempting to use the functions in Rmath.h. In particular, the functions
in Rmath.h are not vectorized. In what follows, I will use Rcpp to mimic the
behavior of both the rmultinom and rpois functions available in base R so that this
functionality and behavior is provided in native C++.

The multinomial distribution
generalizes the binomial distribution to k discrete outcomes instead of 2; consequently,
it is parameterized in terms of k probabilities that must sum to 1. The base R function
rmultinom used for generating multinomial data takes three arguments: n the number of
simulated data sets to produce, size, the number of multinomial outcomes to sample for
each data set, and prob a numeric vector of probabilities. The function returns a k
$\times$ n integer matrix.

The following C++ code uses the R::rmultinom function available in Rmath.h to generate
size multinomial outcomes. The R::rmultinom function relies on referencing a pointer
to an IntegerVector to store the results. We create a helper function, rmultinom_1,
that draws size multinomial outcomes from the multinomial distribution based on the
probabilities in prob. We then do this n independent times in the function
rmultinom_rcpp. To match the base R functionality, rmultinom_rcpp returns a k
$\times$ n IntegerMatrix.

We now check if the rmultinom and rmultinom_rcpp functions produce the same results.
We generate a vector of 200 probabilities that sum to 1. We will sample 500 multinomial
outcomes and do this independently 20 times.

 TRUE


A benchmark of the functions suggests that the rmultinom_rcpp function is very slightly
slower than the rmultinom function, but that is not really a concern for our purposes.

Unit: milliseconds
expr     min      lq    mean  median      uq     max neval cld
rmultinom(1000, size, prob) 10.8676 10.9925 11.1737 11.0826 11.1898 13.9595   100  a
rmultinom_rcpp(1000, size, prob) 11.0879 11.2072 11.4897 11.3341 11.6203 13.9920   100   b


The poisson distribution is a
non-negative discrete distribution characterized by having identical mean and
variance. The base R function rpois used for generating Poisson data takes two
arguments: n the number of simulated values to produce, and lambda, a positive numeric
vector. The rpois function cycles (and recycles) through the values in lambda for each
successive value simulated. The function produces an integer vector of length n. We
provide similar functionality using the R::rpois function available in Rmath.h. Note
that we cycle through the values of lambda so that if the end of the lambda vector is
reached before we have generated n values, then we restart at the beginning of the
lambda vector.

We now evaluate whether the rpois and rpois functions produce the same results. We
generate a positive vector with 200 values for lambda and draw length(lambda) + 5
independent Poisson values.

 TRUE


A benchmark of the two functions suggests the rpois_rcpp function may be slightly faster,
but once again, that is not our primary concern here.

Unit: microseconds
expr   min     lq    mean median     uq    max neval cld
rpois(length(lambda) + 5, lambda) 7.412 7.7780 8.50529 7.9495 8.2165 60.014   100   b
rpois_rcpp(length(lambda) + 5, lambda) 6.721 6.9965 7.37430 7.1645 7.4950 21.263   100  a 