Squeezing more speed from R for nothing, Rcpp style

(This article was first published on On the lambda » ROn the lambda, and kindly contributed to R-bloggers)

In a previous post we explored how you can greatly speed up certain types of long-running computations in R by parallelizing your code using multicore package*. I also mentioned that there were a few other ways to speed up R code; the one I will be exploring in this post is using Rcpp to replace time-critical inner-loops with C++.

In general, good C++ code almost always runs faster than equivalent R code. Higher level language affordances like garbage collection, dynamic typing, and bounds checking can add a lot of computational overhead. Further, C/C++ compiles down to machine code, whereas R byte-code has to be interpreted.

On the other hand, I would hate to do all my statistics programming in a language like C++, precisely because of those higher-level language affordances I mentioned above. When development time (as opposed to execution time) is taken into account, programming in R is much faster for me (and makes me a very happy programmer).

On occasion, though, there are certain sections of R code that I wish I could rewrite in C/C++. They may be simple computations that get called thousands or millions of times in a loop. If I could just write these time-critical snippets with C/C++ and not have to throw the proverbial baby out with the bath water (and rewrite everything in C), my code would run much faster.

Though there have been packages to make this sort of thing since the early 2000s, Rcpp (and the Rcpp family**) has made this even easier; now interfacing with R objects is seamless.

To show an example of how you might use Rcpp, I’ve used the same example from my post "Parallel R (and air travel)". In this example, we use longitude and latitude info from all US airports to derive the average (mean) distance between every two US airports. The function I will be replacing with C++ is the function to compute the distance between two longitude latitude pairs (Haversine's formula) on a sphere (which is just an approximation).

The R functions to do this look like this:

to.radians<-function(degrees){
degrees * pi / 180
}

haversine <- function(lat1, long1, lat2, long2, unit="km"){
term1 <- sin(delta.phi/2) ^ 2
term2 <- cos(phi1) * cos(phi2) * sin(delta.lambda/2) ^ 2
the.terms <- term1 + term2
delta.sigma <- 2 * atan2(sqrt(the.terms), sqrt(1-the.terms))
if(unit=="km") return(distance)
if(unit=="miles") return(0.621371*distance)
}


While the C++ functions look like this:

#include <iostream>
#include <math.h>
#include <Rcpp.h>

// [[Rcpp::export]]
return(degrees * 3.141593 / 180);
}

// [[Rcpp::export]]
double haversine_cpp(double lat1, double long1,
double lat2, double long2,
std::string unit="km"){
double delta_phi = to_radians_cpp(lat2 - lat1);
double delta_lambda = to_radians_cpp(long2 - long1);
double term1 = pow(sin(delta_phi / 2), 2);
double term2 = cos(phi1) * cos(phi2) * pow(sin(delta_lambda/2), 2);
double the_terms = term1 + term2;
double delta_sigma = 2 * atan2(sqrt(the_terms), sqrt(1-the_terms));
double distance = radius * delta_sigma;

/* if it is anything *but* km it is miles */
if(unit != "km"){
return(distance*0.621371);
}

return(distance);
}


Besides for the semicolons, other assignment operator and the type declarations, these codes are almost identical.

Next, we put the C++ code above in a C++ source file. We will call it, and automatically compile and link to it from our driver R code thusly***:

calc.distance.two.rows <- function(ind1, ind2,
version=haversine){
return(version(air.locs[ind1, 2],
air.locs[ind1, 3],
air.locs[ind2, 2],
air.locs[ind2, 3]))
}

combos <- combn(1:nrow(air.locs), 2, simplify=FALSE)
num.of.comps <- length(combos)

mult.core <- function(version=haversine_cpp){
the.sum <- sum(unlist(mclapply(combos,
function(x){
calc.distance.two.rows(x[1], x[2],
version)
},
mc.cores=4)))
result <- the.sum / num.of.comps
return(result)
}

mult.core(version=haversine_cpp)


Comparing the R version against the C++ version over a range of sample sizes yielded a chart like this:

To run this to completion would have taken 4 hours but, if my math is correct, rewriting the distance function shaved of over 15 minutes from the completion time.

It is not uncommon for the Rcpp to speed up R code by *orders of magnitude*. In this link, Dirk Eddelbuettel demonstrates an 80-fold speed increase (albeit with a contrived example).

So why did we not get an 80-fold increase?

I could have (and will) rewrite more of this program in Rcpp to avoid some of the overhead with repeated calls to compiled C++. My point here was more to show that we can use Rcpp to speed up this program with very little work–almost for nothing. Again, besides for certain syntactical differences and type declarations, the R and C++ functions are virtually the same.

As you become more comfortable with it–and use it more within the same scripts–Rcpp will likely pay higher and higher dividends.

The next time we revisit this contrived airport example, we will be profiling it expanding the C++ and eventually, use distributed computing to get it as fast as we can.

* the 'multicore' package is now deprecated in favor of 'parallel'
** RCpp11 (for modern C++), RccpEigen (for use of the Eigen C++ linear algebra template library), RCppArmadillo (for use of the Eigen C++ linear algebra template library), and a few others
*** this code is a little bit different than the code in the first airport distance blog post because I switched from using the 'multicore' package to the 'parallel' package