subset vectors in Rcpp11

June 7, 2014
By

(This article was first published on R Enthusiast and R/C++ hero, and kindly contributed to R-bloggers)

[    ]

Under the impulsion of @kevin_ushey who already did something similar for Rcpp, we've been adding subsetting behavior into Rcpp11.

The idea is given a vector y and a vector x we want to give meaning to y[x].

The first legitimate question is what kind of x do we want to allow. This has been discussed since january. So far, we've settled to allow x to be integer, logical and character vectors. The main source of anxiety here being the typical Cornelian Dilemma

Do we use 0-based or 1-based indices ?

We decided to use 0-based indices, as this is what we do when x is a scalar int, and this is C++ : indexing starts at 0 .

rhs use

Given that, here is a first example:

NumericVector y = sqrt( seq_len(10) ) ;  
IntegerVector x {0,1,2} ;  
NumericVector res = y[x] ;  
// [1] 1.000000 1.414214 1.732051

The way we implemented this, y[x] does not yet return a NumericVector, that would have been too easy, instead it gives us a lovely sugar expression.

NumericVector y = sqrt( seq_len(10) ) ;  
IntegerVector x {0,1,2} ;  
auto res = y[x] ;  
Rprintf( "type(res) = %s\n", DEMANGLE(decltype(res)) ) ;  
// type(res) = Rcpp::SubsetProxy<Rcpp::Vector<14, Rcpp::PreserveStorage>, int, Rcpp::Vector<13, Rcpp::PreserveStorage> >
return res ;  
// [1] 1.000000 1.414214 1.732051

This is relevant because we don't need to materialize the data too early, we can send it to whatever sugar function:

NumericVector y = sqrt( seq_len(10) ) ;  
IntegerVector x {0,1,2} ;  
auto res = sapply( y[x], [](double x){ return x*x; }) ;  
Rprintf( "type(res) = %s\n", DEMANGLE(decltype(res)) ) ;  
// type(res) = Rcpp::sugar::Sapply<double, Rcpp::SubsetProxy<Rcpp::Vector<14, Rcpp::PreserveStorage>, int, Rcpp::Vector<13, Rcpp::PreserveStorage> >, test()::$_0>
return res ;  
// [1] 1 2 3

x may also be a sugar expression, it does not necessarily need to be a materialized vector. For example:

NumericVector y = sqrt( seq_len(10) ) ;  
auto res = sapply( y[seq(0, 4)], [](double x){ return x*x; }) ;  
Rprintf( "type(res) = %s\n", DEMANGLE(decltype(res)) ) ;  
// type(res) = Rcpp::sugar::Sapply<double, Rcpp::SubsetProxy<Rcpp::Vector<14, Rcpp::PreserveStorage>, int, Rcpp::sugar::Seq>, test()::$_0>
return res ;  
// [1] 1 2 3 4 5 

And it can be a logical or character expression. For example y[ y < 2.0 ] ...

lhs use

In addition to being a sugar expression, that knows how to apply itself to a vector, the object that is created by y[x] may also be used on the lhs of the expression.

For example :

NumericVector y = sqrt( seq_len(10) ) ;  
IntegerVector x {0,1,2} ;  
y[x] = - y[x] ;  
return y ;  
// [1] -1.000000 -1.414214 -1.732051  2.000000  2.236068  2.449490  2.645751
// [8]  2.828427  3.000000  3.162278

And of course, handling sugar :

NumericVector y = sqrt( seq_len(10) ) ;  
IntegerVector x {0,1,2} ;  
y[2*x] = - y[x] ;  
return y ;  
// [1] -1.000000  1.414214 -1.414214  2.000000  1.414214  2.449490  2.645751
// [8]  2.828427  3.000000  3.162278 

Although the feature has been discussed for a few months, it is pretty new so things might change. Actually I came up with a few ideas while writing this post.

To leave a comment for the author, please follow the link and comment on his blog: R Enthusiast and R/C++ hero.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.