# Generating a multivariate gaussian distribution using RcppArmadillo

**Rcpp Gallery**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are many ways to simulate a multivariate gaussian distribution assuming that you can simulate from independent univariate normal distributions. One of the most popular method is based on the Cholesky decomposition. Let’s see how `Rcpp`

and `Armadillo`

perform on this task.

#include// [[Rcpp::depends(RcppArmadillo)]] using namespace Rcpp; // [[Rcpp::export]] arma::mat mvrnormArma(int n, arma::vec mu, arma::mat sigma) { int ncols = sigma.n_cols; arma::mat Y = arma::randn(n, ncols); return arma::repmat(mu, 1, n).t() + Y * arma::chol(sigma); }

The easiest way to perform a Cholesky distribution in R is to use the `chol`

function in `R`

which interface some fast `LAPACK`

routines.

### naive implementation in R mvrnormR <- function(n, mu, sigma) { ncols <- ncol(sigma) mu <- rep(mu, each = n) ## not obliged to use a matrix (recycling) mu + matrix(rnorm(n * ncols), ncol = ncols) %*% chol(sigma) }

We will also use `MASS:mvrnorm`

which implements it differently:

require(MASS) Loading required package: MASS ### Covariance matrix and mean vector sigma <- matrix(c(1, 0.9, -0.3, 0.9, 1, -0.4, -0.3, -0.4, 1), ncol = 3) mu <- c(10, 5, -3) require(MASS) ### checking variance set.seed(123) cor(mvrnormR(100, mu, sigma)) [,1] [,2] [,3] [1,] 1.0000 0.8851 -0.3830 [2,] 0.8851 1.0000 -0.4675 [3,] -0.3830 -0.4675 1.0000 cor(MASS::mvrnorm(100, mu, sigma)) [,1] [,2] [,3] [1,] 1.0000 0.9106 -0.3016 [2,] 0.9106 1.0000 -0.4599 [3,] -0.3016 -0.4599 1.0000 cor(mvrnormArma(100, mu, sigma)) [,1] [,2] [,3] [1,] 1.000 0.9020 -0.3530 [2,] 0.902 1.0000 -0.4889 [3,] -0.353 -0.4889 1.0000 ## checking means colMeans(mvrnormR(100, mu, sigma)) [1] 9.850 4.911 -2.902 colMeans(MASS::mvrnorm(100, mu, sigma)) [1] 10.051 5.046 -2.914 colMeans(mvrnormArma(100, mu, sigma)) [1] 9.825 4.854 -2.873

Now, let’s benchmark the different versions:

require(rbenchmark) Loading required package: rbenchmark benchmark(mvrnormR(1e4, mu, sigma), MASS::mvrnorm(1e4, mu, sigma), mvrnormArma(1e4, mu, sigma), columns = c('test', 'replications', 'relative', 'elapsed'), order = 'elapsed') test replications relative elapsed 3 mvrnormArma(10000, mu, sigma) 100 1.000 0.219 1 mvrnormR(10000, mu, sigma) 100 1.913 0.419 2 MASS::mvrnorm(10000, mu, sigma) 100 2.046 0.448

The `RcppArmadillo`

function outperforms the MASS implementation and the naive R code, but more surprisinugly `mvrnormR`

is slightly faster than `mvrnorm`

in this benchmark.

To be fair, while digging into the `MASS::mvrnorm`

code it appears that there are few code sanity checks ( such as the positive definiteness of `Sigma`

).

**leave a comment**for the author, please follow the link and comment on their blog:

**Rcpp Gallery**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.