Vector Subsetting in Rcpp

[This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Rcpp 0.11.1 has introduced flexible subsetting for Rcpp vectors. Subsetting is
implemented for the Rcpp vector types through the [ operator, and intends to
mimic R’s [ operator for most cases.

We diverge from R’s subsetting semantics in a few important ways:

  1. For integer and numeric vectors, 0-based indexing is performed, rather than
    1-based indexing, for subsets.

  2. We throw an error if an index is out of bounds, rather than returning an
    NA value,

  3. We require logical subsetting to be with vectors of the same length, thus
    avoiding bugs that can occur when a logical vector is recycled for a subset
    operation.

Some examples are showcased below:

#include 
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector positives(NumericVector x) {
    return x[x > 0];
}

// [[Rcpp::export]]
List first_three(List x) {
    IntegerVector idx = IntegerVector::create(0, 1, 2);
    return x[idx];
}

// [[Rcpp::export]]
List with_names(List x, CharacterVector y) {
    return x[y];
}
x <- -5:5
positives(x)

[1] 1 2 3 4 5

l <- as.list(1:10)
first_three(l)

[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

l <- setNames(l, letters[1:10])
with_names(l, c("a", "e", "g"))

$a
[1] 1

$e
[1] 5

$g
[1] 7

Most excitingly, the subset mechanism is quite flexible and works well with Rcpp
sugar. For example:

#include 
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector in_range(NumericVector x, double low, double high) {
    return x[x > low & x < high];
}

// [[Rcpp::export]]
NumericVector no_na(NumericVector x) {
    return x[ !is_na(x) ];
}

bool is_character(SEXP x) {
    return TYPEOF(x) == STRSXP;
}

// [[Rcpp::export]]
List charvecs(List x) {
    return x[ sapply(x, is_character) ];
}
set.seed(123)
x <- rnorm(5)
in_range(x, -1, 1)

[1] -0.56048 -0.23018  0.07051  0.12929

no_na( c(1, 2, NA, 4, NaN, 10) )

[1]  1  2  4 10

l <- list(1, 2, "a", "b", TRUE)
charvecs(l)

[[1]]
[1] "a"

[[2]]
[1] "b"

And, these can be quite fast:

library(microbenchmark)
R_in_range <- function(x, low, high) {
    return(x[x > low & x < high])
}
x <- rnorm(1E5)
identical( R_in_range(x, -1, 1), in_range(x, -1, 1) )

[1] TRUE

microbenchmark( times=5, 
    R_in_range(x, -1, 1),
    in_range(x, -1, 1)
)

Unit: milliseconds
                 expr   min    lq median    uq   max neval
 R_in_range(x, -1, 1) 8.168 8.556   9.02 9.073 9.223     5
   in_range(x, -1, 1) 5.210 5.424   5.48 5.507 6.233     5

R_no_na <- function(x) {
    return( x[!is.na(x)] )
}
x[sample(1E5, 1E4)] <- NA
identical(no_na(x), R_no_na(x))

[1] TRUE

microbenchmark( times=5,
    R_no_na(x),
    no_na(x)
)

Unit: milliseconds
       expr   min    lq median   uq   max neval
 R_no_na(x) 3.958 3.960  4.019 4.02 4.458     5
   no_na(x) 1.891 1.936  1.961 2.02 2.755     5

We hope users of Rcpp will find the new subset semantics fast, flexible, and
useful throughout their projects.

To leave a comment for the author, please follow the link and comment on their blog: Rcpp Gallery.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)