# Another Rcpp/RcppArmadillo success story

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I finally had an excuse to try out RcppArmadillo and was amazed at just how well it worked out.**Shifting sands**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The exact details are a bit beyond a blog post, but basically I had a matrix of roughly 80,000×2 double valued samples (call it A) and needed to find where in another, much smaller matrix (call it B) each of the samples was closest too.

There was some structure to the B matrix, which is 500×2, so it wasn’t necessary to iterate through all 500, each lookup jumped around a bit and took 8 iterations.

The pseduocode in R looked a bit like this

find_closest <- function(A, B) {

for(i in 1:8) {

…voodoo…

}

return index into B that A is closest too

}

calc_distance <- function(A, B) apply(A, 1, find_closest, B=B)

All up it took about 9.5 seconds on my system, and 15 seconds on another.

The first pass I implemented find_closest using RcppArmadillo, and saw a healthy speed up to around 400 ms, which was a big improvement.

Then I realised I might as well do calc_distance/the apply in C++ as it was just a simple vector of integers being returned.

This gave an amazing performance leap, the function now takes around 12 milliseconds all up, down from 9.5 seconds. On another machine it would take 15 seconds, and ended up taking 10 milliseconds.

I was very surprised at this. I haven’t have a chance to dig into the details, but I am assuming there is a reasonable amount of overhead passing the data from R to RcppArmadillo. In the case of apply, this would be incurred for every row/column the apply was running find_closest on. By moving the apply to C++, all the data was passed from R to C++ only once, giving the large speedup. Or so I guess.

I tried two versions, one that traversed A & B row wise, and the other by column. The column one generally seemed faster, 12 ms vs 19 ms for rows. According to the Arma docs it stores matrices in column-major order, which might explain that difference.

Would appreciate any comments or pointers to documentation so I can better understand what is going on under the hood there.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Shifting sands**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.