Quick conversion of a list of lists into a data frame

January 22, 2013

(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)

Data frames are one of R’s distinguishing features. Exposing a list of lists as an array of cases, they make many formal operations such as regression or optimization easy to represent.

The R data.frame operation for lists is quite slow, in large part because it exposes a vast amount of functionality. This sample shows one way to write a much faster data.frame creator in C++ if one is willing to forego that generality.


using namespace Rcpp;

// [[Rcpp::export]]
List CheapDataFrameBuilder(List a) {
    List returned_frame = clone(a);
    GenericVector sample_row = returned_frame(0);

    StringVector row_names(sample_row.length());
    for (int i = 0; i < sample_row.length(); ++i) {
        char name[5];
        sprintf(&(name[0]), "%d", i);
        row_names(i) = name;
    returned_frame.attr("row.names") = row_names;

    StringVector col_names(returned_frame.length());
    for (int j = 0; j < returned_frame.length(); ++j) {
        char name[6];
        sprintf(&(name[0]), "X.%d", j);
        col_names(j) = name;
    returned_frame.attr("names") = col_names;
    returned_frame.attr("class") = "data.frame";

    return returned_frame;

Here is the result of comparing the native function to this version.

a <- replicate(250, 1:100, simplify=FALSE)

res <- benchmark(as.data.frame(a), 
                 order="relative", replications=500)
                      test replications elapsed relative
2 CheapDataFrameBuilder(a)          500   0.104      1.0
1         as.data.frame(a)          500  16.730    160.9

There are some subtleties in this code:

— It turns out that one can’t send super-large data frames to it because of possible buffer overflows. I’ve never seen that problem when I’ve written Rcpp functions which exchanged SEXPs with R, but this one uses Rcpp:export in order to use sourceCpp.

— Notice the invocation of clone() in the first line of the code. If you don’t do that, you wind up side-effecting the parameter, which is not what most people would expect.

To leave a comment for the author, please follow the link and comment on their blog: Rcpp Gallery.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)