Sorting Numeric Vectors in C++ and R

January 31, 2013
By

(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)

Consider the problem to sort all elements of the given vector in ascending order. We can simply use the function std::sort from the C++ STL.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector stl_sort(NumericVector x) {
   NumericVector y = clone(x);
   std::sort(y.begin(), y.end());
   return y;
}
library(rbenchmark)
set.seed(123)
z <- rnorm(100000)
x <- rnorm(100)

# check that stl_sort is the same as sort
stopifnot(all.equal(stl_sort(x), sort(x)))

# benchmark stl_sort and sort
benchmark(stl_sort(z), sort(z), order="relative")[,1:4]
         test replications elapsed relative
1 stl_sort(z)          100   0.632    1.000
2     sort(z)          100   1.164    1.842

Consider the problem of sorting the first n elements of a given vector. The function std::partial_sort from the C++ STL does just this.

// [[Rcpp::export]]
NumericVector stl_partial_sort(NumericVector x, int n) {
   NumericVector y = clone(x);
   std::partial_sort(y.begin(), y.begin()+n, y.end());
   return y;
}

An alternate implementation of a partial sort algorithm is to use std::nth_element to partition the given vector at the nth sorted element and then use std::sort, both from the STL, to sort the vector from the beginning to the nth element.

For an equivalent implementation in R, we can use the sort function by specifying a vector of 1:n for the partial argument (i.e. partial=1:n).

// [[Rcpp::export]]
NumericVector nth_partial_sort(NumericVector x, int nth) {
   NumericVector y = clone(x);
   std::nth_element(y.begin(), y.begin()+nth, y.end());
   std::sort(y.begin(), y.begin()+nth);
   return y;
}
n <- 25000

# check that stl_partial_sort is equal to nth_partial_sort
stopifnot(all.equal(stl_partial_sort(x, 50)[1:50], 
                    nth_partial_sort(x, 50)[1:50]))

# benchmark stl_partial_sort, nth_element_sort, and sort
benchmark(stl_partial_sort(z, n),
          nth_partial_sort(z, n),
          sort(z, partial=1:n),
          order="relative")[,1:4]
                    test replications elapsed relative
2 nth_partial_sort(z, n)          100   0.208    1.000
1 stl_partial_sort(z, n)          100   0.516    2.481
3 sort(z, partial = 1:n)          100   0.796    3.827

An interesting result to note is the gain in speed of nth_partial_sort over stl_partial_sort. In this case, for the given data, it is faster to use the combination ofstd::nth_element and std::sort rather than std::partial_sort to sort the first n elements of a vector.

// [[Rcpp::export]]
NumericVector stl_nth_element(NumericVector x, int n) {
   NumericVector y = clone(x);
   std::nth_element(y.begin(), y.begin()+n, y.end());
   return y;
}

Finally, consider a problem where you only need a single element of a sorted vector. The function std::nth_element from the C++ STL does just this. An example of this type of problem is computing the median of a given vector.

For an equivalent implementation in R, we can use the sort function by specifying a scalar value for the argument partial (i.e. partial=n).

# check that the nth sorted elements of the vectors are equal
stopifnot(all.equal(stl_nth_element(x, 43)[43], sort(x, partial=43)[43]))

# benchmark nth_element and sort
benchmark(stl_nth_element(z, n),
         sort(z, partial=n),
         order="relative")[,1:4]
                   test replications elapsed relative
1 stl_nth_element(z, n)          100   0.089    1.000
2  sort(z, partial = n)          100   0.238    2.674

While these are not huge speed improvements over the base R sort function, this post demonstrates how to easily access sorting functions in the C++ STL and is a good exercise to better understand the differences and performance of the sorting algorithms available in C++ and R.

To leave a comment for the author, please follow the link and comment on his blog: Rcpp Gallery.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.