# Generating a multivariate gaussian distribution using RcppArmadillo

March 12, 2013
By

[This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are many ways to simulate a multivariate gaussian distribution assuming that you can simulate from independent univariate normal distributions. One of the most popular method is based on the Cholesky decomposition. Let’s see how `Rcpp` and `Armadillo` perform on this task.

``````#include

using namespace Rcpp;

// [[Rcpp::export]]
arma::mat mvrnormArma(int n, arma::vec mu, arma::mat sigma) {
int ncols = sigma.n_cols;
arma::mat Y = arma::randn(n, ncols);
return arma::repmat(mu, 1, n).t() + Y * arma::chol(sigma);
}
``````

The easiest way to perform a Cholesky distribution in R is to use the `chol` function in `R` which interface some fast `LAPACK` routines.

``````### naive implementation in R
mvrnormR <- function(n, mu, sigma) {
ncols <- ncol(sigma)
mu <- rep(mu, each = n) ## not obliged to use a matrix (recycling)
mu + matrix(rnorm(n * ncols), ncol = ncols) %*% chol(sigma)
}
``````

We will also use `MASS:mvrnorm` which implements it differently:

``````require(MASS)
``````
```Loading required package: MASS
```
``````### Covariance matrix and mean vector
sigma <- matrix(c(1, 0.9, -0.3, 0.9, 1, -0.4, -0.3, -0.4, 1), ncol = 3)
mu <- c(10, 5, -3)

require(MASS)
### checking variance
set.seed(123)
cor(mvrnormR(100, mu,  sigma))
``````
```        [,1]    [,2]    [,3]
[1,]  1.0000  0.8851 -0.3830
[2,]  0.8851  1.0000 -0.4675
[3,] -0.3830 -0.4675  1.0000
```
``````cor(MASS::mvrnorm(100, mu, sigma))
``````
```        [,1]    [,2]    [,3]
[1,]  1.0000  0.9106 -0.3016
[2,]  0.9106  1.0000 -0.4599
[3,] -0.3016 -0.4599  1.0000
```
``````cor(mvrnormArma(100, mu, sigma))
``````
```       [,1]    [,2]    [,3]
[1,]  1.000  0.9020 -0.3530
[2,]  0.902  1.0000 -0.4889
[3,] -0.353 -0.4889  1.0000
```
``````## checking means
colMeans(mvrnormR(100, mu, sigma))
``````
```[1]  9.850  4.911 -2.902
```
``````colMeans(MASS::mvrnorm(100, mu, sigma))
``````
```[1] 10.051  5.046 -2.914
```
``````colMeans(mvrnormArma(100, mu, sigma))
``````
```[1]  9.825  4.854 -2.873
```

Now, let’s benchmark the different versions:

``````require(rbenchmark)
``````
```Loading required package: rbenchmark
```
``````benchmark(mvrnormR(1e4, mu, sigma),
MASS::mvrnorm(1e4, mu, sigma),
mvrnormArma(1e4, mu, sigma),
columns = c('test', 'replications', 'relative', 'elapsed'),
order = 'elapsed')
``````
```                             test replications relative elapsed
3   mvrnormArma(10000, mu, sigma)          100    1.000   0.219
1      mvrnormR(10000, mu, sigma)          100    1.913   0.419
2 MASS::mvrnorm(10000, mu, sigma)          100    2.046   0.448
```

The `RcppArmadillo` function outperforms the MASS implementation and the naive R code, but more surprisinugly `mvrnormR` is slightly faster than `mvrnorm` in this benchmark.

To be fair, while digging into the `MASS::mvrnorm` code it appears that there are few code sanity checks ( such as the positive definiteness of `Sigma` ).

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...