# Vector Subsetting in Rcpp

March 15, 2014
By

(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)

Rcpp 0.11.1 has introduced flexible subsetting for Rcpp vectors. Subsetting is
implemented for the Rcpp vector types through the `[` operator, and intends to
mimic R’s `[` operator for most cases.

We diverge from R’s subsetting semantics in a few important ways:

1. For integer and numeric vectors, 0-based indexing is performed, rather than
1-based indexing, for subsets.

2. We throw an error if an index is out of bounds, rather than returning an
`NA` value,

3. We require logical subsetting to be with vectors of the same length, thus
avoiding bugs that can occur when a logical vector is recycled for a subset
operation.

Some examples are showcased below:

``````#include
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector positives(NumericVector x) {
return x[x > 0];
}

// [[Rcpp::export]]
List first_three(List x) {
IntegerVector idx = IntegerVector::create(0, 1, 2);
return x[idx];
}

// [[Rcpp::export]]
List with_names(List x, CharacterVector y) {
return x[y];
}``````
``````x <- -5:5
positives(x)``````
```[1] 1 2 3 4 5
```
``````l <- as.list(1:10)
first_three(l)``````
```[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3
```
``````l <- setNames(l, letters[1:10])
with_names(l, c("a", "e", "g"))``````
```\$a
[1] 1

\$e
[1] 5

\$g
[1] 7
```

Most excitingly, the subset mechanism is quite flexible and works well with Rcpp
sugar. For example:

``````#include
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector in_range(NumericVector x, double low, double high) {
return x[x > low & x < high];
}

// [[Rcpp::export]]
NumericVector no_na(NumericVector x) {
return x[ !is_na(x) ];
}

bool is_character(SEXP x) {
return TYPEOF(x) == STRSXP;
}

// [[Rcpp::export]]
List charvecs(List x) {
return x[ sapply(x, is_character) ];
}``````
``````set.seed(123)
x <- rnorm(5)
in_range(x, -1, 1)``````
```[1] -0.56048 -0.23018  0.07051  0.12929
```
``no_na( c(1, 2, NA, 4, NaN, 10) )``
```[1]  1  2  4 10
```
``````l <- list(1, 2, "a", "b", TRUE)
charvecs(l)``````
```[[1]]
[1] "a"

[[2]]
[1] "b"
```

And, these can be quite fast:

``````library(microbenchmark)
R_in_range <- function(x, low, high) {
return(x[x > low & x < high])
}
x <- rnorm(1E5)
identical( R_in_range(x, -1, 1), in_range(x, -1, 1) )``````
```[1] TRUE
```
``````microbenchmark( times=5,
R_in_range(x, -1, 1),
in_range(x, -1, 1)
)``````
```Unit: milliseconds
expr   min    lq median    uq   max neval
R_in_range(x, -1, 1) 8.168 8.556   9.02 9.073 9.223     5
in_range(x, -1, 1) 5.210 5.424   5.48 5.507 6.233     5
```
``````R_no_na <- function(x) {
return( x[!is.na(x)] )
}
x[sample(1E5, 1E4)] <- NA
identical(no_na(x), R_no_na(x))``````
```[1] TRUE
```
``````microbenchmark( times=5,
R_no_na(x),
no_na(x)
)``````
```Unit: milliseconds
expr   min    lq median   uq   max neval
R_no_na(x) 3.958 3.960  4.019 4.02 4.458     5
no_na(x) 1.891 1.936  1.961 2.02 2.755     5
```

We hope users of Rcpp will find the new subset semantics fast, flexible, and
useful throughout their projects.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...