Accelerating path-dependent loops: A quick Rcpp case study

August 23, 2011

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

User BobH asked

on StackOverflow about accelerating path-dependent loops
. He provided a simple example in which a vector gets filled conditional on the value of
the preceding element. Simple to code, but hard to vectorise.

By the time I saw that question yesterday evening, Josh Ulrich had already posted a nice answer suggesting
to switch from ifelse to if to escape the overhead of a vectorised expression on simple scalars. I meekly added a comment
suggesting that Rcpp would likely do well on this and that someone should volunteer
such an answer. Well, it is still August and Mr. Someone is on holiday, so this morning I followed up with such an answer. And as it turns out to
work quite well indeed, we will repost it here.

Let’s start with the general setup, and the two functions supplied by Josh. We also byte-compile these using the compiler package which
is the culmination of several years of work by R core member Luke Tierney. This package was added to R
during the last major release, and we assessed it
in this earlier blog post as well as several Rcpp-related
follow-ups. We also load the packages for on-the-fly ‘inline’ compilation and easy benchmarking:


fun1 <- function(z) {
  for(i in 2:NROW(z)) {
    z[i] <- ifelse(z[i-1]==1, 1, 0)
fun1c <- cmpfun(fun1)

fun2 <- function(z) {
  for(i in 2:NROW(z)) {
    z[i] <- if(z[i-1]==1) 1 else 0
fun2c <- cmpfun(fun2)

We see that basic worker just assign to the i-th element based on the preceding element. Function two uses the aforementioned
if statement, and both are also byte-compiled.

Writing the same code in C++ using both Rcpp (for the R/C++ integration) and
inline for the on-the-fly compilation, linking and loading of C++ code into R is pretty
straightforward too:

funRcpp <- cxxfunction(signature(zs="numeric"), plugin="Rcpp", body="
  Rcpp::NumericVector z = Rcpp::NumericVector(zs);
  int n = z.size();
  for (int i=1; i<n; i++) {
    z[i] = (z[i-1]==1.0 ? 1.0 : 0.0);

This single R function call takes the code embedded in the argument to the body variable, builds a complete function in temporary file,
and then compiles, links and loads it. We can now call funRcpp() just like the other four functions and do so via the
benchmark() function of the rbenchmark package.

R> z <- rep(c(1,1,0,0,0,0), 100)
R> identical(fun1(z),fun2(z),fun1c(z),fun2c(z),funRcpp(z))
[1] TRUE
R> res <- benchmark(fun1(z), fun2(z),
+                  fun1c(z), fun2c(z),
+                  funRcpp(z),
+                  columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"),
+                  order="relative",
+                  replications=1000)
R> print(res)
        test replications elapsed relative user.self sys.self
5 funRcpp(z)         1000   0.005      1.0      0.01        0
4   fun2c(z)         1000   0.482     96.4      0.48        0
2    fun2(z)         1000   1.989    397.8      1.98        0
3   fun1c(z)         1000  11.365   2273.0     11.37        0
1    fun1(z)         1000  13.210   2642.0     13.21        0

We can focus on the columns labelled elapsed and relative. The C++ version is faster by a factor of almost one-hundred compared to
the byte-compiled version of funtion2, and almost four-hundred times faster than the plain-R variant of function2. And function1 is even worse,
coming at well over twenty-two-hundred times the run-time of the C++ version. Byte-compilation also helps little here.

For comparison, we also ran the original example of a very short vector, called more frequently:

R> z <- c(1,1,0,0,0,0)
R> res2 <- benchmark(fun1(z), fun2(z),
+                  fun1c(z), fun2c(z),
+                  funRcpp(z),
+                  columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"),
+                  order="relative",
+                  replications=10000)
R> print(res2)
        test replications elapsed relative user.self sys.self
5 funRcpp(z)        10000   0.047  1.00000      0.04        0
4   fun2c(z)        10000   0.134  2.85106      0.13        0
2    fun2(z)        10000   0.328  6.97872      0.32        0
3   fun1c(z)        10000   1.080 22.97872      1.08        0
1    fun1(z)        10000   1.243 26.44681      1.24        0

The qualitative ranking is unchanged: the Rcpp version dominates. Function2 using
if instead of the vectorised ifelse is second-best with the byte-compiled version being about twice as fast that the plain R
variant, but still almost three times slower than the C++ version. And the relative differences are less pronounced: relatively speaking, the
function call overhead matters less here and the actual looping matters more: C++ gets a bigger advantage on the actual loop operations in the longer
vectors. That it is an important result as it suggests that on more real-life sized data, the compiled version may reap a larger benefit.

All in all a nice demonstration of Rcpp, and a gain of almost one-hundred to the best
byte-compiled version is nothing to sneeze at—especially when it is so easy to write and load a five-line C++ function thanks to

Before closing, a quick reminder that I will giving two classes on
Rcpp in a few weeks. These will be in New York on September 24, and San Franciso on October 8, see
this blog post as well as
this page at Revolution Analytics (who are a
co-organiser of the classes) for details and registration information.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box . offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.