Sorting Numeric Vectors in C++ and R

[This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Consider the problem to sort all elements of the given vector in ascending order. We can simply use the function std::sort from the C++ STL.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector stl_sort(NumericVector x) {
   NumericVector y = clone(x);
   std::sort(y.begin(), y.end());
   return y;
}

library(rbenchmark)
set.seed(123)
z <- rnorm(100000)
x <- rnorm(100)

# check that stl_sort is the same as sort
stopifnot(all.equal(stl_sort(x), sort(x)))

# benchmark stl_sort and sort
benchmark(stl_sort(z), sort(z), order="relative")[,1:4]


         test replications elapsed relative
1 stl_sort(z)          100   0.632    1.000
2     sort(z)          100   1.164    1.842

Consider the problem of sorting the first n elements of a given vector. The function std::partial_sort from the C++ STL does just this.

// [[Rcpp::export]]
NumericVector stl_partial_sort(NumericVector x, int n) {
   NumericVector y = clone(x);
   std::partial_sort(y.begin(), y.begin()+n, y.end());
   return y;
}

An alternate implementation of a partial sort algorithm is to use std::nth_element to partition the given vector at the nth sorted element and then use std::sort, both from the STL, to sort the vector from the beginning to the nth element.

For an equivalent implementation in R, we can use the sort function by specifying a vector of 1:n for the partial argument (i.e. partial=1:n).

// [[Rcpp::export]]
NumericVector nth_partial_sort(NumericVector x, int nth) {
   NumericVector y = clone(x);
   std::nth_element(y.begin(), y.begin()+nth, y.end());
   std::sort(y.begin(), y.begin()+nth);
   return y;
}

n <- 25000

# check that stl_partial_sort is equal to nth_partial_sort
stopifnot(all.equal(stl_partial_sort(x, 50)[1:50], 
                    nth_partial_sort(x, 50)[1:50]))

# benchmark stl_partial_sort, nth_element_sort, and sort
benchmark(stl_partial_sort(z, n),
          nth_partial_sort(z, n),
          sort(z, partial=1:n),
          order="relative")[,1:4]


                    test replications elapsed relative
2 nth_partial_sort(z, n)          100   0.208    1.000
1 stl_partial_sort(z, n)          100   0.516    2.481
3 sort(z, partial = 1:n)          100   0.796    3.827

An interesting result to note is the gain in speed of nth_partial_sort over stl_partial_sort. In this case, for the given data, it is faster to use the combination ofstd::nth_element and std::sort rather than std::partial_sort to sort the first n elements of a vector.

// [[Rcpp::export]]
NumericVector stl_nth_element(NumericVector x, int n) {
   NumericVector y = clone(x);
   std::nth_element(y.begin(), y.begin()+n, y.end());
   return y;
}

Finally, consider a problem where you only need a single element of a sorted vector. The function std::nth_element from the C++ STL does just this. An example of this type of problem is computing the median of a given vector.

For an equivalent implementation in R, we can use the sort function by specifying a scalar value for the argument partial (i.e. partial=n).

# check that the nth sorted elements of the vectors are equal
stopifnot(all.equal(stl_nth_element(x, 43)[43], sort(x, partial=43)[43]))

# benchmark nth_element and sort
benchmark(stl_nth_element(z, n),
         sort(z, partial=n),
         order="relative")[,1:4]


                   test replications elapsed relative
1 stl_nth_element(z, n)          100   0.089    1.000
2  sort(z, partial = n)          100   0.238    2.674

While these are not huge speed improvements over the base R sort function, this post demonstrates how to easily access sorting functions in the C++ STL and is a good exercise to better understand the differences and performance of the sorting algorithms available in C++ and R.

To leave a comment for the author, please follow the link and comment on their blog: Rcpp Gallery.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)