Rcpp 0.11.1 has introduced flexible subsetting for Rcpp vectors. Subsetting is
implemented for the Rcpp vector types through the [
operator, and intends to
mimic R’s [
operator for most cases.
We diverge from R’s subsetting semantics in a few important ways:

For integer and numeric vectors, 0based indexing is performed, rather than
1based indexing, for subsets. 
We throw an error if an index is out of bounds, rather than returning an
NA
value, 
We require logical subsetting to be with vectors of the same length, thus
avoiding bugs that can occur when a logical vector is recycled for a subset
operation.
Some examples are showcased below:
#include
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector positives(NumericVector x) {
return x[x > 0];
}
// [[Rcpp::export]]
List first_three(List x) {
IntegerVector idx = IntegerVector::create(0, 1, 2);
return x[idx];
}
// [[Rcpp::export]]
List with_names(List x, CharacterVector y) {
return x[y];
}
x < 5:5
positives(x)
[1] 1 2 3 4 5
l < as.list(1:10)
first_three(l)
[[1]] [1] 1 [[2]] [1] 2 [[3]] [1] 3
l < setNames(l, letters[1:10])
with_names(l, c("a", "e", "g"))
$a [1] 1 $e [1] 5 $g [1] 7
Most excitingly, the subset mechanism is quite flexible and works well with Rcpp
sugar. For example:
#include
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector in_range(NumericVector x, double low, double high) {
return x[x > low & x < high];
}
// [[Rcpp::export]]
NumericVector no_na(NumericVector x) {
return x[ !is_na(x) ];
}
bool is_character(SEXP x) {
return TYPEOF(x) == STRSXP;
}
// [[Rcpp::export]]
List charvecs(List x) {
return x[ sapply(x, is_character) ];
}
set.seed(123)
x < rnorm(5)
in_range(x, 1, 1)
[1] 0.56048 0.23018 0.07051 0.12929
no_na( c(1, 2, NA, 4, NaN, 10) )
[1] 1 2 4 10
l < list(1, 2, "a", "b", TRUE)
charvecs(l)
[[1]] [1] "a" [[2]] [1] "b"
And, these can be quite fast:
library(microbenchmark)
R_in_range < function(x, low, high) {
return(x[x > low & x < high])
}
x < rnorm(1E5)
identical( R_in_range(x, 1, 1), in_range(x, 1, 1) )
[1] TRUE
microbenchmark( times=5,
R_in_range(x, 1, 1),
in_range(x, 1, 1)
)
Unit: milliseconds expr min lq median uq max neval R_in_range(x, 1, 1) 8.168 8.556 9.02 9.073 9.223 5 in_range(x, 1, 1) 5.210 5.424 5.48 5.507 6.233 5
R_no_na < function(x) {
return( x[!is.na(x)] )
}
x[sample(1E5, 1E4)] < NA
identical(no_na(x), R_no_na(x))
[1] TRUE
microbenchmark( times=5,
R_no_na(x),
no_na(x)
)
Unit: milliseconds expr min lq median uq max neval R_no_na(x) 3.958 3.960 4.019 4.02 4.458 5 no_na(x) 1.891 1.936 1.961 2.02 2.755 5
We hope users of Rcpp will find the new subset semantics fast, flexible, and
useful throughout their projects.
Rbloggers.com offers daily email updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...