Generating a multivariate gaussian distribution using RcppArmadillo

[This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are many ways to simulate a multivariate gaussian distribution assuming that you can simulate from independent univariate normal distributions. One of the most popular method is based on the Cholesky decomposition. Let’s see how Rcpp and Armadillo perform on this task.

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;

// [[Rcpp::export]]
arma::mat mvrnormArma(int n, arma::vec mu, arma::mat sigma) {
   int ncols = sigma.n_cols;
   arma::mat Y = arma::randn(n, ncols);
   return arma::repmat(mu, 1, n).t() + Y * arma::chol(sigma);
}

The easiest way to perform a Cholesky distribution in R is to use the chol function in R which interface some fast LAPACK routines.

### naive implementation in R
mvrnormR <- function(n, mu, sigma) {
    ncols <- ncol(sigma)
    mu <- rep(mu, each = n) ## not obliged to use a matrix (recycling)
    mu + matrix(rnorm(n * ncols), ncol = ncols) %*% chol(sigma)
}

We will also use MASS:mvrnorm which implements it differently:

require(MASS)


Loading required package: MASS

### Covariance matrix and mean vector
sigma <- matrix(c(1, 0.9, -0.3, 0.9, 1, -0.4, -0.3, -0.4, 1), ncol = 3)
mu <- c(10, 5, -3)

require(MASS)
### checking variance
set.seed(123)
cor(mvrnormR(100, mu,  sigma))


        [,1]    [,2]    [,3]
[1,]  1.0000  0.8851 -0.3830
[2,]  0.8851  1.0000 -0.4675
[3,] -0.3830 -0.4675  1.0000

cor(MASS::mvrnorm(100, mu, sigma))


        [,1]    [,2]    [,3]
[1,]  1.0000  0.9106 -0.3016
[2,]  0.9106  1.0000 -0.4599
[3,] -0.3016 -0.4599  1.0000

cor(mvrnormArma(100, mu, sigma))


       [,1]    [,2]    [,3]
[1,]  1.000  0.9020 -0.3530
[2,]  0.902  1.0000 -0.4889
[3,] -0.353 -0.4889  1.0000

## checking means
colMeans(mvrnormR(100, mu, sigma))


[1]  9.850  4.911 -2.902

colMeans(MASS::mvrnorm(100, mu, sigma))


[1] 10.051  5.046 -2.914

colMeans(mvrnormArma(100, mu, sigma))


[1]  9.825  4.854 -2.873

Now, let’s benchmark the different versions:

require(rbenchmark)


Loading required package: rbenchmark

benchmark(mvrnormR(1e4, mu, sigma),
          MASS::mvrnorm(1e4, mu, sigma),
          mvrnormArma(1e4, mu, sigma),
          columns = c('test', 'replications', 'relative', 'elapsed'),
          order = 'elapsed')


                             test replications relative elapsed
3   mvrnormArma(10000, mu, sigma)          100    1.000   0.219
1      mvrnormR(10000, mu, sigma)          100    1.913   0.419
2 MASS::mvrnorm(10000, mu, sigma)          100    2.046   0.448

The RcppArmadillo function outperforms the MASS implementation and the naive R code, but more surprisinugly mvrnormR is slightly faster than mvrnorm in this benchmark.

To be fair, while digging into the MASS::mvrnorm code it appears that there are few code sanity checks ( such as the positive definiteness of Sigma ).

To leave a comment for the author, please follow the link and comment on their blog: Rcpp Gallery.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)