Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There is a great RcppArmadillo implementation of multivariate normal densities. But I was looking for the first derivative of the multivariate normal densities. Good implementations are surprisingly hard to come by. I wasn’t able to find any online and my first solutions in R were pretty slow. RcppArmadillo might be a great alternative particularly because I am not aware of any c or Fortran versions that can be called from R. In such situations, we can expect the largest performance gains. Indeed, the RcppArmadillo version presented below is over 300-times faster than the R implementation!

Let us start with some R code. First, `dmvnorm_deriv1` is a simple R implementation of the formula shown in the Matrix Cookbook (formula 346 and 347, Nov 15, 2012 version). The second version `dmvnorm_deriv2` extends Peter Rossi’s implementation of the multivariate normal densities in his package `bayesm`, which uses a much faster algorithm and can be used to improve our implementation of the first derivative.

```library('RcppArmadillo',quietly=TRUE)
library('rbenchmark',quietly=TRUE)
library('mvtnorm',quietly=TRUE)

dmvnorm_deriv1 <- function(X, mu=rep(0,ncol(X)), sigma=diag(ncol(X))) {
fn <- function(x) -1 * c((1/sqrt(det(2*pi*sigma))) * exp(-0.5*t(x-mu)%*%solve(sigma)%*%(x-mu))) * solve(sigma,(x-mu))
out <- t(apply(X,1,fn))
return(out)
}

# mv normal density based on Peter Rossi's implementation in `bayesm`
dMvn <- function(X,mu,Sigma) {
k <- ncol(X)
rooti <- backsolve(chol(Sigma),diag(k))
}

dmvnorm_deriv2 <- function(X, mean, sigma) {
if (is.vector(X)) X <- matrix(X, ncol = length(X))
if (missing(mean)) mean <- rep(0, length = ncol(X))
if (missing(sigma)) sigma <- diag(ncol(X))
n <- nrow(X)
mvnorm <- dMvn(X, mu = mean, Sigma = sigma)
deriv <- array(NA,c(n,ncol(X)))
for (i in 1:n)
deriv[i,] <- -mvnorm[i] * solve(sigma,(X[i,]-mean))
return(deriv)
}
```

These implementations work but they are not very fast. `dmvnorm_deriv1` is a one-to-one translation of the formula in pure R and more efficient algorithms exist. `dmvnorm_deriv2` uses such an algorithm from the `bayesm` package and is significantly faster. As shown in this gallery post, a translation of this algorithm to RcppArmadillo leads to further performance improvements. But I assume that the real bottleneck is the loop for the calculation of the derivative. So let’s adopt the calculation of the multivariate normal from the gallery post and translate the loop to RcppArmadillo. After some cleaning, we get a nice RcppArmadillo implementation called `dmvnorm_deriv_arma`.

```#include <RcppArmadillo.h>

const double log2pi = std::log(2.0 * M_PI);

// [[Rcpp::export]]
arma::mat dmvnorm_deriv_arma(arma::mat x,
arma::rowvec mean,
arma::mat sigma) {

int xdim = x.n_cols;
arma::mat deriv;
deriv.copy_size(x);
arma::mat rooti = arma::trans(arma::inv(trimatu(arma::chol(sigma))));
double rootisum = arma::sum(log(rooti.diag()));
double constants = -(xdim/2) * log2pi;

int n = x.n_rows;
for (int i=0; i < n; i++) {
arma::vec x_centered = arma::trans(x.row(i) - mean);
arma::vec z = rooti * x_centered;
// get derivative of multivariate normal
deriv.row(i) = -1 * exp(constants - 0.5 * arma::sum(z%z) + rootisum) * trans(solve(sigma, x_centered));
// The part `exp(constants - 0.5 * arma::sum(z%z) + rootisum)` returns the multivarate normal and the other terms translate it to the first derivate
}

return(deriv);
}
```

Finally, we can compare the different versions using simulated data.

```set.seed(123456789)
s <- rWishart(1, 2, diag(2))[,,1]
m <- rnorm(2)
X <- rmvnorm(10000, m, s)

benchmark(dmvnorm_deriv_arma(X,m,s),
dmvnorm_deriv1(X,mu=m,sigma=s),
dmvnorm_deriv2(X,mean=m,sigma=s),
order="relative", replications=50)[,1:4]

test replications elapsed relative
1            dmvnorm_deriv_arma(X, m, s)           50   0.108      1.0
3 dmvnorm_deriv2(X, mean = m, sigma = s)           50  23.085    213.8
2   dmvnorm_deriv1(X, mu = m, sigma = s)           50  75.973    703.5
```

The RcppArmadillo implementation is several hundred times faster! Such stunning performance increases are possible when existing implementation rely on pure R. Of course, the R implementation can be improved as well. 