Deepen your R experience with Rcpp

July 17, 2013
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Joseph Rickert

It is
very likely that even a very casual observer what is happening in the world of
R these past few months would have come across some mention of Rcpp, the R
package that greatly facilitates R and C++ integration. Rcpp is hot! Over 130 R
packages now depend on Rcpp and it is likely to keep growing. The following plot
built using code Tal Galili posted to examine the log files from RStudio’s CRAN
mirror shows the number of downloads of Rcpp around the time R 3.0.0 was
released.

Rcpp_downloads
The intense activity over the first three days and relatively slow tapering
off is noteworthy, especially for what might be considered an “advanced”
package that takes some expertise to use. So, it is not surprising that here have been quite a few conference
presentations this year about some aspect or another of Rcpp, and Dirk
Eddelbuettel, Romain Francois and other experts seem to be hard pressed to keep
up with the demand for training. I had the opportunity
last week at the useR 2013 conference in Spain to attend the tutorial on Rcpp given
by Hadley Wickham and Romain. And, at roughly the same time that on the other side of the world, Dirk gave a similar tutorial to the Sydney Users of R Forum (SURF). 

Romain and Hadley's tutorial was geared to people with
some R skills, but not necessarily any C++ experience. It was very well done;
exceptionally well done. Hadley and Romain are two experienced trainers who are so
good at what they do that they can quickly get a diverse group to a comfortable place where they can begin dealing with the material. The class
was positive, challenging and very motivating.

While I was sitting
there, probably hallucinating in the Albacete heat, I had the thought the Rcpp
phenomenon probably says something about the future of R. No, I don’t mean that
Rcpp or C++ is the future. It occurred to me though that I was seening the results of how a small but committed group of R experts cooperated to deal with a potential threat to R’s continued success. To my way of thinking, this kind of
sustained, creative effort and the willingness of R developers to
connect R to the rest of the computational world indicates that R is likely to
be the platform of choice for statistical computing some time to come.

So what is the threat? It is not big not big news that R can be slow. The following code from Hadley and Romain's tutorial
shows a straightforward C++ function to compute a simple weighted mean, a naïve
implementation of this same function in R, and the built in weighted.mean()
function from the base stats package.

# Script to compare C++ and R
library(Rcpp)
 
# C++ Function in Rcpp wrapper
cppFunction('
double wmean(NumericVector x, NumericVector w) {
int n = x.size();
double total = 0, total_w = 0;
for(int i = 0; i < n; ++i) {
total += x[i] * w[i];
total_w += w[i];
}
return total / total_w;
}
')
 
# Naive R function
wmeanR <- function(x, w) {
  total <- 0
  total_w <- 0
  for (i in seq_along(x)) {
    total <- total + x[i] * w[i]
    total_w <- total_w + w[i]
  }
  total / total_w
}
 
x <- rnorm(100000000)
w <- rnorm(100000000)
 
system.time(wmean(x,w))
 
system.time(wmeanR(x,w))
 
# The proper way to compute a simple weighted mean in R
# using a built in function from the base stats package
system.time(weighted.mean(x,w))

Created by Pretty R at inside-R.org

On my laptop, the naïve R function took
229.47 seconds to run, the built in R function ran in 4.52 seconds, and the C++
function took only 0.28 seconds to execute. Yes, C++ is a lot faster. But, this
is a somewhat contrived example and it is not unreasonable to expect that a
statistician could spend her entire career running weighted.mean() on vectors
of reasonable size and never even consider that R might be slower that
something else. (For vectors of length 1,000,000, weighted.mean() took 0.06
seconds to run on my PC). Speed of execution needs to be evaluated in context. I can't imagine any statistician interuppting the flow of an R session to save a few seconds on a once-in-a-while calculation. However, it is nice to know that there is a reasonable way to proceed in R if the calculation needs to be done 100,000 times.

My three main take-aways from my tutorial were;

  1. For garden variety
    programming (no objects or classes) C++ is not only accessible, but might also
    be fun.
  2. Rcpp along with RTools does
    an incredible amount of “heavy lifting”, hiding the details of working with a
    compiled language from the R user and providing a big league environment for
    writing high performance, R based code.
  3. Even if you have some
    considerable experience with R, it may turn out that R is even richer than you
    thought.

It was a delightful
surprise to realize that gaining some experience C++ might enhance one’s
motivation to learn even more about R. Yes, it is important to know that one can
attempt serious work in R that might have critical execution time constraints and that there are tools such as Rcpp available to help one power through
bottlenecks. However, the richer experience of the tutorial was to consider the
rewards of learning more about the structure of R and all R it has to offer.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.