# Generating a multivariate gaussian distribution using RcppArmadillo

March 12, 2013
By

(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)

There are many ways to simulate a multivariate gaussian distribution assuming that you can simulate from independent univariate normal distributions. One of the most popular method is based on the Cholesky decomposition. Let’s see how `Rcpp` and `Armadillo` perform on this task.

``````#include
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;

// [[Rcpp::export]]
arma::mat mvrnormArma(int n, arma::vec mu, arma::mat sigma) {
int ncols = sigma.n_cols;
arma::mat Y = arma::randn(n, ncols);
return arma::repmat(mu, 1, n).t() + Y * arma::chol(sigma);
}
``````

The easiest way to perform a Cholesky distribution in R is to use the `chol` function in `R` which interface some fast `LAPACK` routines.

``````### naive implementation in R
mvrnormR <- function(n, mu, sigma) {
ncols <- ncol(sigma)
mu <- rep(mu, each = n) ## not obliged to use a matrix (recycling)
mu + matrix(rnorm(n * ncols), ncol = ncols) %*% chol(sigma)
}
``````

We will also use `MASS:mvrnorm` which implements it differently:

``````require(MASS)
``````
```Loading required package: MASS
```
``````### Covariance matrix and mean vector
sigma <- matrix(c(1, 0.9, -0.3, 0.9, 1, -0.4, -0.3, -0.4, 1), ncol = 3)
mu <- c(10, 5, -3)

require(MASS)
### checking variance
set.seed(123)
cor(mvrnormR(100, mu,  sigma))
``````
```        [,1]    [,2]    [,3]
[1,]  1.0000  0.8851 -0.3830
[2,]  0.8851  1.0000 -0.4675
[3,] -0.3830 -0.4675  1.0000
```
``````cor(MASS::mvrnorm(100, mu, sigma))
``````
```        [,1]    [,2]    [,3]
[1,]  1.0000  0.9106 -0.3016
[2,]  0.9106  1.0000 -0.4599
[3,] -0.3016 -0.4599  1.0000
```
``````cor(mvrnormArma(100, mu, sigma))
``````
```       [,1]    [,2]    [,3]
[1,]  1.000  0.9020 -0.3530
[2,]  0.902  1.0000 -0.4889
[3,] -0.353 -0.4889  1.0000
```
``````## checking means
colMeans(mvrnormR(100, mu, sigma))
``````
```  9.850  4.911 -2.902
```
``````colMeans(MASS::mvrnorm(100, mu, sigma))
``````
``` 10.051  5.046 -2.914
```
``````colMeans(mvrnormArma(100, mu, sigma))
``````
```  9.825  4.854 -2.873
```

Now, let’s benchmark the different versions:

``````require(rbenchmark)
``````
```Loading required package: rbenchmark
```
``````benchmark(mvrnormR(1e4, mu, sigma),
MASS::mvrnorm(1e4, mu, sigma),
mvrnormArma(1e4, mu, sigma),
columns = c('test', 'replications', 'relative', 'elapsed'),
order = 'elapsed')
``````
```                             test replications relative elapsed
3   mvrnormArma(10000, mu, sigma)          100    1.000   0.219
1      mvrnormR(10000, mu, sigma)          100    1.913   0.419
2 MASS::mvrnorm(10000, mu, sigma)          100    2.046   0.448
```

The `RcppArmadillo` function outperforms the MASS implementation and the naive R code, but more surprisinugly `mvrnormR` is slightly faster than `mvrnorm` in this benchmark.

To be fair, while digging into the `MASS::mvrnorm` code it appears that there are few code sanity checks ( such as the positive definiteness of `Sigma` ). To leave a comment for the author, please follow the link and comment on their blog: Rcpp Gallery.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)