using C++ within R

[This article was first published on Dan Kelley Blog/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Quite often I write which(...)[1] to find the first element of a vector matching some condition. One has to wonder whether that’s wasteful, though, since there is no need to do any tests once one works. I decided to try using C++, using Rcpp, to see if speed advances could be made.

Procedure

1
2
3
4
5
6
library(Rcpp)
library(microbenchmark)
cppFunction("\n            int firstZero(IntegerVector x) {\n                int nx = x.size();\n                for (int i = 0; i < nx; ++i) {\n                    if (0 == x[i]) {\n                        return i+1;\n                    }\n                }\n                return 0; // means none found\n            }")
x <- rep(1, 10000)
x[seq.int(500, 10000)] <- 0
microbenchmark(firstZero(x), times = 1000L)
## Unit: microseconds
##          expr   min    lq median    uq   max neval
##  firstZero(x) 17.17 18.16  19.05 19.29 738.6  1000
1
microbenchmark(which(0 == x)[1], times = 1000L)
## Unit: microseconds
##              expr   min    lq median    uq   max neval
##  which(0 == x)[1] 31.74 33.26  33.95 35.99 740.1  1000

Results

The C++ method was nearly twice as fast. However, other tests (with different vector lengths, different fractions zeroed-out, etc) showed nearly identical times for the two methods.

Conclusions

In light of variations in test results, and the added complexity of including C++ code in an R program, I advise carrying out data-tailored benchmarks before deciding to use Rcpp.

Note that the test does not account for the time to compile the C++ program, which can outweigh time savings in small problems. However, this is irrelevant because one shouldn’t be worrying about optimization in small problems anyway, and large problems will likely involve package generation, which means that the C++ compilation will be done as the package is being built.

Resources

To leave a comment for the author, please follow the link and comment on their blog: Dan Kelley Blog/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)