**Thinking inside the box**, and kindly contributed to R-bloggers)

While preparing my slides for the Rcpp workshop this Thursday,

I had wondered about more nice examples motivating Rcpp.

So I posed a quick question on the rcpp-devel list.

And I received a few friendly answers. My favourite, so far, was a suggestion by Lance

Bachmeier who sent me a short script which used both R and C++ (via Rcpp) to

simulate a first-order vector autoregressive process (and he ensured me that it worked well enough on his graduate students). It is indeed

a great example as it involves (simple) matrix multiplication in an iterative

fashion. Which makes it a great example not only for Rcpp but also for our

RcppArmadillo package (which wraps Conrad Sanderson’s wonderful

Armadillo C++ templated library for linear algebra and more). And at the same time, we can also add another look

at the new and shiny R compiler I also blogged about recently.

So Lance and I iterated over this a little more over email, and I now added this as a new (and initial) example file in the

RcppArmadillo SVN repo. (As an aside: The newest version 0.2.19 of

RcppArmadillo has been sitting in incoming at

CRAN since earlier in the week while the archive maintainer takes a well-deserved vacation. It

should hit the public archive within a few days, and is otherwise available too from

my site.)

So let’s walk through the example:

R> ## parameter and error terms used throughout R> a <- matrix(c(0.5,0.1,0.1,0.5),nrow=2) R> e <- matrix(rnorm(10000),ncol=2) R> ## Let's start with the R version R> rSim <- function(coeff, errors) { + simdata <- matrix(0, nrow(errors), ncol(errors)) + for (row in 2:nrow(errors)) { + simdata[row,] = coeff %*% simdata[(row-1),] + errors[row,] + } + return(simdata) + } R> rData <- rSim(a, e) # generated by R

This starts with a simple enough loop. After skipping the first row, each iteration multiplies the previous row with the parameters and adds

error terms.

We can then turn to the R compiler:

R> ## Now let's load the R compiler (requires R 2.13 or later) R> suppressMessages(require(compiler)) R> compRsim <- cmpfun(rSim) R> compRData <- compRsim(a,e) # generated by R 'compiled' R> stopifnot(all.equal(rData, compRData)) # checking results

Nice and easy: We load the **compiler** package, create a compiled function and use it. We check the results and surely

enough find them to be identical.

With that, time to turn to C++ using Armadillo via

RcppArmadillo:

R> ## Now load 'inline' to compile C++ code on the fly R> suppressMessages(require(inline)) R> code <- ' + arma::mat coeff = Rcpp::as<arma::mat>(a); + arma::mat errors = Rcpp::as<arma::mat>(e); + int m = errors.n_rows; int n = errors.n_cols; + arma::mat simdata(m,n); + simdata.row(0) = arma::zeros<arma::mat>(1,n); + for (int row=1; row<m; row++) { + simdata.row(row) = simdata.row(row-1)*trans(coeff)+errors.row(row); + } + return Rcpp::wrap(simdata); + ' R> ## create the compiled function R> rcppSim <- cxxfunction(signature(a="numeric",e="numeric"), + code,plugin="RcppArmadillo") R> rcppData <- rcppSim(a,e) # generated by C++ code R> stopifnot(all.equal(rData, rcppData)) # checking results

Here we load the inline package to compile, link and load C++ snippets. We define a

short C++ function in the `code`

variable, declare a signature taking `a`

and `e`

as before and ask

`cxxfunction()`

to deploy the plugin for RcppArmadillo so

that it and Rcpp are found during build. With that, we have a compiled function

to generate data, and we once again check the result. The C++ code is pretty straightforward as well. We can instatiate Armadillo matrices

directly from the R objects we pass down; we then run a similar loop building the result row by row.

Now, with all the build-up, here is the final timing comparison, using the

rbenchmark package:

R> ## now load the rbenchmark package and compare all three R> suppressMessages(library(rbenchmark)) R> res <- benchmark(rcppSim(a,e), + rSim(a,e), + compRsim(a,e), + columns=c("test", "replications", "elapsed", + "relative", "user.self", "sys.self"), + order="relative") R> print(res) test replications elapsed relative user.self sys.self 1 rcppSim(a, e) 100 0.038 1.0000 0.04 0 3 compRsim(a, e) 100 2.011 52.9211 2.01 0 2 rSim(a, e) 100 4.148 109.1579 4.14 0

So in a real-world example involving looping and some algebra (which is of course already done by BLAS and LAPACK libraries), the new R

compiler improves by more than a factor of two, cutting time from 4.14 seconds down to about 2 seconds. Yet, this still leaves the C++

solution, clocking in at a mere 38 milliseconds, ahead by a factor of over fifty relative to the new R compiler

And compared to just R itself, the simple solution involving

Rcpp and

RcppArmadillo is almost 110 times faster. As I mentioned, I quite like

this example ;-).

**leave a comment**for the author, please follow the link and comment on their blog:

**Thinking inside the box**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...