user2013: The Rcpp tutorial

July 9, 2013
By

(This article was first published on 4D Pie Charts » R, and kindly contributed to R-bloggers)

I’m at user 2013, and this morning I attended Hadley Wickham and Romain Francois’s tutorial on the Rcpp package for calling C++ code from R. I’ve spent the last eight years avoiding C++ afer having nightmares about obscure pointer bugs, so I went into the room slightly skeptical about this package.

I think the most important takeaway from the tutorial was a clear sense of when and why you might want to use C++.

The main selling point for using C++ with R is, in Hadley’s words, that R is optimised for making programmers efficient whereas C++ is made for making machine efficient, so the langauages are complimentary. That is, most of the time the slow part of doing statistics is you. Occasionally however, the slow part will be running your code, and in those instances C++ is better than R.

In order to write fast R code, it needs to be vectorised, and that often means using different functions to a scalar version. A classic example is using ifelse instead of separate if and else blocks, or using pmax instead of max.

Knowing how to vectorise R code thus requires quite a large vocabulary of functions. In C++ there is no vectorisation – you just write a for loop.

There are three things in particular that C++ does much faster than R: the above mentioned looping, resizing vectors and calling functions. (For the last point, Hadley quoted an overhead of 2ns to call a function in C++ versus 200ns in R.)

This means that C++ is useful for the following restricted use cases:

  1. When vectorisation is difficult or impossible. This is common when one element of a vector depends upon previous elements. MCMC is a classic example.
  2. When you are changing the size of a vector in a loop. Run length encoding was the example given.
  3. When you need to make millions of function calls. Recursive functions and some optimisation and simulation problems fit this category.

Typically C++ can give you an order of magnitude or two speed up over an R equivalent, but this is wildly problem-dependent and many of the built-in functions call C code which will run at the same speed (more or less) as a C++ version. It’s also important to consider how often the code will be run. Even if you have a thousand-fold speedup, if the running time of the R function is 0.01s, then you need to run it 60000 times just to get back the 10 minutes it took you to rewrite it in C++.

Anyway, using Rcpp makes it surprisingly simple to call C++ code. You need to install Rtools under windows, and of course the Rcpp package.

install.packages(c("installr", "Rcpp"))
library(installr)
install.Rtools()

Check that Rcpp is working by seeing if the following expression returns 2.

library(Rcpp)
evalCpp("1 + 1")

Then you can create a C++ function using the cppFunction function. Here’s a reimplementation of the any function. (Although it doesn’t deal with missing values.)

cppFunction('
  bool Any(LogicalVector x)
  {
    for(int i = 0; i < x.size(); ++i)
    {
      if(x[i])
      {
        return true;
      }
    }
    return false;
  }
')

Notice that in C++ you must be explicit about the types of variable that are passed into and returned from a function. How to wrie C++ is beyond the scope of the post, so I’ll say no more.

You can now call the Any function like this.

Any(runif(10) > 0.5) #returns TRUE
Any(runif(10) > 1.5) #returns FALSE

Tagged: c++, r, rcpp, user2013

To leave a comment for the author, please follow the link and comment on his blog: 4D Pie Charts » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.