Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Rcpp 0.11.1 has introduced flexible subsetting for Rcpp vectors. Subsetting is implemented for the Rcpp vector types through the `[` operator, and intends to mimic R’s `[` operator for most cases.

We diverge from R’s subsetting semantics in a few important ways:

1. For integer and numeric vectors, 0-based indexing is performed, rather than 1-based indexing, for subsets.

2. We throw an error if an index is out of bounds, rather than returning an `NA` value,

3. We require logical subsetting to be with vectors of the same length, thus avoiding bugs that can occur when a logical vector is recycled for a subset operation.

Some examples are showcased below:

```#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector positives(NumericVector x) {
return x[x > 0];
}

// [[Rcpp::export]]
List first_three(List x) {
IntegerVector idx = IntegerVector::create(0, 1, 2);
return x[idx];
}

// [[Rcpp::export]]
List with_names(List x, CharacterVector y) {
return x[y];
}
x <- -5:5
positives(x)

 1 2 3 4 5

l <- as.list(1:10)
first_three(l)

[]
 1

[]
 2

[]
 3

l <- setNames(l, letters[1:10])
with_names(l, c("a", "e", "g"))

\$a
 1

\$e
 5

\$g
 7
```

Most excitingly, the subset mechanism is quite flexible and works well with Rcpp sugar. For example:

```#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector in_range(NumericVector x, double low, double high) {
return x[x > low & x < high];
}

// [[Rcpp::export]]
NumericVector no_na(NumericVector x) {
return x[ !is_na(x) ];
}

bool is_character(SEXP x) {
return TYPEOF(x) == STRSXP;
}

// [[Rcpp::export]]
List charvecs(List x) {
return x[ sapply(x, is_character) ];
}
set.seed(123)
x <- rnorm(5)
in_range(x, -1, 1)

 -0.56048 -0.23018  0.07051  0.12929

no_na( c(1, 2, NA, 4, NaN, 10) )

  1  2  4 10

l <- list(1, 2, "a", "b", TRUE)
charvecs(l)

[]
 "a"

[]
 "b"
```

And, these can be quite fast:

```library(microbenchmark)
R_in_range <- function(x, low, high) {
return(x[x > low & x < high])
}
x <- rnorm(1E5)
identical( R_in_range(x, -1, 1), in_range(x, -1, 1) )

 TRUE

microbenchmark( times=5,
R_in_range(x, -1, 1),
in_range(x, -1, 1)
)

Unit: milliseconds
expr   min    lq median    uq   max neval
R_in_range(x, -1, 1) 8.168 8.556   9.02 9.073 9.223     5
in_range(x, -1, 1) 5.210 5.424   5.48 5.507 6.233     5

R_no_na <- function(x) {
return( x[!is.na(x)] )
}
x[sample(1E5, 1E4)] <- NA
identical(no_na(x), R_no_na(x))

 TRUE

microbenchmark( times=5,
R_no_na(x),
no_na(x)
)

Unit: milliseconds
expr   min    lq median   uq   max neval
R_no_na(x) 3.958 3.960  4.019 4.02 4.458     5
no_na(x) 1.891 1.936  1.961 2.02 2.755     5
```

We hope users of Rcpp will find the new subset semantics fast, flexible, and useful throughout their projects. 