Another nice Rcpp example

April 23, 2011

(This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

While preparing my slides for the Rcpp workshop this Thursday,
I had wondered about more nice examples motivating Rcpp.
So I posed a quick question on the rcpp-devel list.

And I received a few friendly answers. My favourite, so far, was a suggestion by Lance
who sent me a short script which used both R and C++ (via Rcpp) to
simulate a first-order vector autoregressive process (and he ensured me that it worked well enough on his graduate students). It is indeed
a great example as it involves (simple) matrix multiplication in an iterative
fashion. Which makes it a great example not only for Rcpp but also for our
RcppArmadillo package (which wraps Conrad Sanderson’s wonderful
Armadillo C++ templated library for linear algebra and more). And at the same time, we can also add another look
at the new and shiny R compiler I also blogged about recently.

So Lance and I iterated over this a little more over email, and I now added this as a new (and initial) example file in the
RcppArmadillo SVN repo. (As an aside: The newest version 0.2.19 of
RcppArmadillo has been sitting in incoming at
CRAN since earlier in the week while the archive maintainer takes a well-deserved vacation. It
should hit the public archive within a few days, and is otherwise available too from
my site.)

So let’s walk through the example:

R> ## parameter and error terms used throughout
R> a <- matrix(c(0.5,0.1,0.1,0.5),nrow=2)
R> e <- matrix(rnorm(10000),ncol=2)
R> ## Let's start with the R version
R> rSim <- function(coeff, errors) {
+   simdata <- matrix(0, nrow(errors), ncol(errors))
+   for (row in 2:nrow(errors)) {
+     simdata[row,] = coeff %*% simdata[(row-1),] + errors[row,]
+   }
+   return(simdata)
+ }
R> rData <- rSim(a, e)                     # generated by R

This starts with a simple enough loop. After skipping the first row, each iteration multiplies the previous row with the parameters and adds
error terms.

We can then turn to the R compiler:

R> ## Now let's load the R compiler (requires R 2.13 or later)
R> suppressMessages(require(compiler))
R> compRsim <- cmpfun(rSim)
R> compRData <- compRsim(a,e)              # generated by R 'compiled'
R> stopifnot(all.equal(rData, compRData))  # checking results

Nice and easy: We load the compiler package, create a compiled function and use it. We check the results and surely
enough find them to be identical.

With that, time to turn to C++ using Armadillo via

R> ## Now load 'inline' to compile C++ code on the fly
R> suppressMessages(require(inline))
R> code <- '
+   arma::mat coeff = Rcpp::as(a);
+   arma::mat errors = Rcpp::as(e);
+   int m = errors.n_rows; int n = errors.n_cols;
+   arma::mat simdata(m,n);
+   simdata.row(0) = arma::zeros(1,n);
+   for (int row=1; row
+     simdata.row(row) = simdata.row(row-1)*trans(coeff)+errors.row(row);
+   }
+   return Rcpp::wrap(simdata);
+ '
R> ## create the compiled function
R> rcppSim <- cxxfunction(signature(a="numeric",e="numeric"),
+                        code,plugin="RcppArmadillo")
R> rcppData <- rcppSim(a,e)                # generated by C++ code
R> stopifnot(all.equal(rData, rcppData))   # checking results

Here we load the inline package to compile, link and load C++ snippets. We define a
short C++ function in the code variable, declare a signature taking a and e as before and ask
cxxfunction() to deploy the plugin for RcppArmadillo so
that it and Rcpp are found during build. With that, we have a compiled function
to generate data, and we once again check the result. The C++ code is pretty straightforward as well. We can instatiate Armadillo matrices
directly from the R objects we pass down; we then run a similar loop building the result row by row.

Now, with all the build-up, here is the final timing comparison, using the
rbenchmark package:

R> ## now load the rbenchmark package and compare all three
R> suppressMessages(library(rbenchmark))
R> res <- benchmark(rcppSim(a,e),
+                  rSim(a,e),
+                  compRsim(a,e),
+                  columns=c("test", "replications", "elapsed",
+                            "relative", "user.self", "sys.self"),
+                  order="relative")
R> print(res)
            test replications elapsed relative user.self sys.self
1  rcppSim(a, e)          100   0.038   1.0000      0.04        0
3 compRsim(a, e)          100   2.011  52.9211      2.01        0
2     rSim(a, e)          100   4.148 109.1579      4.14        0

So in a real-world example involving looping and some algebra (which is of course already done by BLAS and LAPACK libraries), the new R
compiler improves by more than a factor of two, cutting time from 4.14 seconds down to about 2 seconds. Yet, this still leaves the C++
solution, clocking in at a mere 38 milliseconds, ahead by a factor of over fifty relative to the new R compiler

And compared to just R itself, the simple solution involving
Rcpp and
RcppArmadillo is almost 110 times faster. As I mentioned, I quite like
this example ;-).

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box . offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)