Deepen your R experience with Rcpp

July 17, 2013
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Joseph Rickert

It is very likely that even a very casual observer what is happening in the world of R these past few months would have come across some mention of Rcpp, the R package that greatly facilitates R and C++ integration. Rcpp is hot! Over 130 R packages now depend on Rcpp and it is likely to keep growing. The following plot built using code Tal Galili posted to examine the log files from RStudio’s CRAN mirror shows the number of downloads of Rcpp around the time R 3.0.0 was released.

Rcpp_downloads
The intense activity over the first three days and relatively slow tapering off is noteworthy, especially for what might be considered an “advanced” package that takes some expertise to use. So, it is not surprising that here have been quite a few conference presentations this year about some aspect or another of Rcpp, and Dirk Eddelbuettel, Romain Francois and other experts seem to be hard pressed to keep up with the demand for training. I had the opportunity last week at the useR 2013 conference in Spain to attend the tutorial on Rcpp given by Hadley Wickham and Romain. And, at roughly the same time that on the other side of the world, Dirk gave a similar tutorial to the Sydney Users of R Forum (SURF). 

Romain and Hadley's tutorial was geared to people with some R skills, but not necessarily any C++ experience. It was very well done; exceptionally well done. Hadley and Romain are two experienced trainers who are so good at what they do that they can quickly get a diverse group to a comfortable place where they can begin dealing with the material. The class was positive, challenging and very motivating.

While I was sitting there, probably hallucinating in the Albacete heat, I had the thought the Rcpp phenomenon probably says something about the future of R. No, I don’t mean that Rcpp or C++ is the future. It occurred to me though that I was seening the results of how a small but committed group of R experts cooperated to deal with a potential threat to R’s continued success. To my way of thinking, this kind of sustained, creative effort and the willingness of R developers to connect R to the rest of the computational world indicates that R is likely to be the platform of choice for statistical computing some time to come.

So what is the threat? It is not big not big news that R can be slow. The following code from Hadley and Romain's tutorial shows a straightforward C++ function to compute a simple weighted mean, a naïve implementation of this same function in R, and the built in weighted.mean() function from the base stats package.

# Script to compare C++ and R
library(Rcpp)
 
# C++ Function in Rcpp wrapper
cppFunction('
double wmean(NumericVector x, NumericVector w) {
int n = x.size();
double total = 0, total_w = 0;
for(int i = 0; i < n; ++i) {
total += x[i] * w[i];
total_w += w[i];
}
return total / total_w;
}
')
 
# Naive R function
wmeanR <- function(x, w) {
  total <- 0
  total_w <- 0
  for (i in seq_along(x)) {
    total <- total + x[i] * w[i]
    total_w <- total_w + w[i]
  }
  total / total_w
}
 
x <- rnorm(100000000)
w <- rnorm(100000000)
 
system.time(wmean(x,w))
 
system.time(wmeanR(x,w))
 
# The proper way to compute a simple weighted mean in R
# using a built in function from the base stats package
system.time(weighted.mean(x,w))

Created by Pretty R at inside-R.org

On my laptop, the naïve R function took 229.47 seconds to run, the built in R function ran in 4.52 seconds, and the C++ function took only 0.28 seconds to execute. Yes, C++ is a lot faster. But, this is a somewhat contrived example and it is not unreasonable to expect that a statistician could spend her entire career running weighted.mean() on vectors of reasonable size and never even consider that R might be slower that something else. (For vectors of length 1,000,000, weighted.mean() took 0.06 seconds to run on my PC). Speed of execution needs to be evaluated in context. I can't imagine any statistician interuppting the flow of an R session to save a few seconds on a once-in-a-while calculation. However, it is nice to know that there is a reasonable way to proceed in R if the calculation needs to be done 100,000 times.

My three main take-aways from my tutorial were;

  1. For garden variety programming (no objects or classes) C++ is not only accessible, but might also be fun.
  2. Rcpp along with RTools does an incredible amount of “heavy lifting”, hiding the details of working with a compiled language from the R user and providing a big league environment for writing high performance, R based code.
  3. Even if you have some considerable experience with R, it may turn out that R is even richer than you thought.

It was a delightful surprise to realize that gaining some experience C++ might enhance one’s motivation to learn even more about R. Yes, it is important to know that one can attempt serious work in R that might have critical execution time constraints and that there are tools such as Rcpp available to help one power through bottlenecks. However, the richer experience of the tutorial was to consider the rewards of learning more about the structure of R and all R it has to offer.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.