Efficient Ragged Arrays in R and Rcpp

[This article was first published on Life in Code, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When is R Slow, and Why?

Computational speed is a common complaint lodged against R.
Some recent posts on r-bloggers.com have compared
the speed of R with some other programming languages [1], and showed the
favorable impact of the new compiler package on run-times [2]. I and
others have written about using Rcpp to easily write C++ functions to speed-up bottlenecks
in R [3,4]. With the new Rcpp attributes framework,
writing fully vectorized C++ functions and incorporating them in R code is now very easy [5].

On a day-to-day basis, though, R’s performance is largely a function of coding style.
R allows novices users to write horribly inefficient code [6]
that produces the correct answer (eventually). Yet by failing to utilize vectorization
and pre-allocation of data structures, naive R code can be many orders of magnitude slower than
need be. R-help is littered with the
tears of novices, and there’s even a (fantastic)
parody of Dante’s Inferno outlining the common “Deadly Sins of R” [7].

Problem Statement: Appending to Ragged Arrays

I recently stumbled onto an interesting code optimization problem that I *didn’t* have a quick solution for,
and that I’m sure others
have encountered. What is the “R way” to vectorize computations on ragged array?
One example of a ragged array is
a list of vectors that have varying and different lengths. Say you need to dynamically grow
many vectors by varying lengths
over the course of a stochastic simulation. Using a simple tool like lapply, the entire data
structure will be allocated anew with every assignment. This problem is briefly touched
on in the official
to R
documentation, which simply notes that “when the subclass sizes [e.g. vector sizes] are all the same the indexing
may be done implicitly and much more efficiently”. But what if you’re data *isn’t* rectangular? How
might one intelligently vectorize a ragged array to prevent (sloooow) memory allocations at every step?

The obvious answer is to pre-allocate a rectangular matrix (or array) that is larger than the maximum
possible vector length, and store each vector as a row (or column?) in the matrix. Now we can use matrix
assignment, and for each vector track the index of the start of free space.
If we try to write past the end of the matrix, R emits the appropriate error.
This method requires some book-keeping on our part. One nice addition would be
an S4 class with slots for the data matrix and the vector of free-space
indices, as well as methods to dynamically expand the data matrix and validate the object.
As an aside, this solution is essentially the inverse of a sparse matrix. Sparse matrices use
much less memory at the expense of slower access times [8].
Here, we’re using more memory than is strictly needed to achieve much faster access times.

Is pre-allocation and book-keeping worth the trouble? object.size(matrix(1.5, nrow=1e3,
shows that a data structure of 1,000 vectors, each of length approximately 1,000,
occupies about 8Mb of memory. Let’s say I resize this structure 1,000 times. Now I’m looking
at almost a gigabyte of memory allocations. Perhaps you’re getting a sense of what a terrible idea
it is to *not* pre-allocate a frequently-resized ragged list?

Three Solutions and Some Tests

Using the above logic, I prototyped a solution as an R function, and then transcribed the result into
a C++ function (boundary checks are important in C++).
The result is three methods: a “naive” list-append method, an R method that uses matrix assignment,
and a final C++ method that modifies the pre-allocated matrix in-place.
In C++/Rcpp, functions can use pass-by-reference semantics [9],
which can have major speed advantages by
allowing functions to modify their arguments in-place. Full disclosure: pass-by-reference semantics requires
some caution on the user’s part. Pass-by-reference is very different from R’s “function
programming” semantics (pass-by-value, copy-on-modify), where side-effects are minimized and an explicit
assignment call is required to modify an object [10].

I added a unit test to ensure identical results between all three methods, and then used the fantastic rbenchmark package to time each solution.
As expected, the naive method is laughably slow. By comparison, and perhaps counter-intuitively, the R and C++
pre-allocation methods are close in performance. Only with more iterations and larger
data structures does the C++ method really start to pull ahead. And by that time, the naive R
method takes *forever*.

Refactoring existing code to use the pre-allocated compound data structure (matrix plus indices)
is a more challenging exercise that’s “left to the reader”, as mathematics textbooks oft say.
lapply() is
conceptually simple, and is often fast *enough*. Some work is required to transcribe code from this
simpler style to use the “anti-sparse” matrix (and indices). There’s a temptation to prototype a solution
using lapply()
and then “fix” things later. But if you’re using ragged arrays and doing any heavy lifting
(large data structures, many iterations), the timings show that pre-allocation is more than worth the effort.


Note: you can find the full code here and

Setup: two helper functions are used to generate ragged arrays via random
draws. First, draws from the negative binomial distribution determine the length of the each new vector (with a
minimum length of 1, gen.lengths()), and draws from the normal distribution fill each
vector with data (gen.dat()).

## helper functions

Three solutions: a naive lapply() method, followed by pre-allocation
in R.

## naive method

Next, the solution as a C++ function. This goes in a separate file that I’ll call helpers.cpp (compiled below).

using namespace Rcpp ;

// [[Rcpp::export]]
void appendRcpp(  List fillVecs, NumericVector newLengths, NumericMatrix retmat, NumericVector retmatLengths) {
    // "append" fill oldmat w/  
    // we will loop through rows, filling retmat in with the vectors in list
    // then update retmat_size to index the next free
    // newLenths isn't used, added for compatibility
    NumericVector fillTmp;
    int sizeOld, sizeAdd, sizeNew;
    // pull out dimensions of matrix to fill
    int nrow = retmat.nrow();
    int ncol = retmat.ncol();
    // check that dimensions match
    if ( nrow != retmatLengths.size() || nrow != fillVecs.size()) { 
        throw std::range_error("In appendC(): dimension mismatch");
    for (int ii = 0; ii= ncol) {
            throw std::range_error("In appendC(): exceeded max cols");
        // iterator for row to fill
        NumericMatrix::Row retRow = retmat(ii, _);
        // fill row of return matrix, starting at first non-zero elem
        std::copy( fillTmp.begin(), fillTmp.end(), retRow.begin() + sizeOld);
        // update size of retmat
        retmatLengths[ii] = sizeNew;

Putting the pieces together: a unit test ensures the results of all three methods are identical, and a function that runs each
solution with identical data will be used for timing.

## unit test

Finally, we run it all:

## Obviously, Rcpp requires a C++ compiler
## compilation, linking, and loading of the C++ function into R is done behind the scenes

## run unit test, verify both functions return identical results

Benchmark results show that the list-append method is 500 times slower than the improved
R method, and 1,000 times slower than the C++ method (timings).
As we move to larger data structures
(timings.fast), the advantage of modifying in-place with C++ rather than having to
explicitly assign the results quickly add up.

> timings
  test replications elapsed relative user.self sys.self
2    c           10   0.057    1.000     0.056    0.000
3 list           10  52.792  926.175    52.674    0.036
1    r           10   0.128    2.246     0.123    0.003

> timings.fast
  test replications elapsed relative user.self sys.self
2    c           10   0.684    1.000     0.683    0.000
1    r           10  24.962   36.494    24.934    0.027


[1] How slow is R really?

[2] Speeding
up R computations Pt II: compiling

[3] Efficient loops in R – the complexity versus speed trade-off

[4] Faster (recursive) function calls: Another quick Rcpp case study

[5] Rcpp attributes: A simple example ‘making pi’

[6] StackOverflow: Speed up the loop operation in R

[7] The R Inferno

[8] Sparse matrix formats: pros and cons

[9] Advanced R: OO field guide: Reference Classes

[10] Advanced R: Functions: Return Values

To leave a comment for the author, please follow the link and comment on their blog: Life in Code.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)