# Sorting Numeric Vectors in C++ and R

January 31, 2013
By

(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)

Consider the problem to sort all elements of the given vector in ascending order. We can simply use the function `std::sort` from the C++ STL.

``````#include
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector stl_sort(NumericVector x) {
NumericVector y = clone(x);
std::sort(y.begin(), y.end());
return y;
}
``````
``````library(rbenchmark)
set.seed(123)
z <- rnorm(100000)
x <- rnorm(100)

# check that stl_sort is the same as sort
stopifnot(all.equal(stl_sort(x), sort(x)))

# benchmark stl_sort and sort
benchmark(stl_sort(z), sort(z), order="relative")[,1:4]
``````
```         test replications elapsed relative
1 stl_sort(z)          100   0.632    1.000
2     sort(z)          100   1.164    1.842
```

Consider the problem of sorting the first `n` elements of a given vector. The function `std::partial_sort` from the C++ STL does just this.

``````// [[Rcpp::export]]
NumericVector stl_partial_sort(NumericVector x, int n) {
NumericVector y = clone(x);
std::partial_sort(y.begin(), y.begin()+n, y.end());
return y;
}
``````

An alternate implementation of a partial sort algorithm is to use `std::nth_element` to partition the given vector at the nth sorted element and then use `std::sort`, both from the STL, to sort the vector from the beginning to the nth element.

For an equivalent implementation in R, we can use the `sort` function by specifying a vector of `1:n` for the partial argument (i.e. `partial=1:n`).

``````// [[Rcpp::export]]
NumericVector nth_partial_sort(NumericVector x, int nth) {
NumericVector y = clone(x);
std::nth_element(y.begin(), y.begin()+nth, y.end());
std::sort(y.begin(), y.begin()+nth);
return y;
}
``````
``````n <- 25000

# check that stl_partial_sort is equal to nth_partial_sort
stopifnot(all.equal(stl_partial_sort(x, 50)[1:50],
nth_partial_sort(x, 50)[1:50]))

# benchmark stl_partial_sort, nth_element_sort, and sort
benchmark(stl_partial_sort(z, n),
nth_partial_sort(z, n),
sort(z, partial=1:n),
order="relative")[,1:4]
``````
```                    test replications elapsed relative
2 nth_partial_sort(z, n)          100   0.208    1.000
1 stl_partial_sort(z, n)          100   0.516    2.481
3 sort(z, partial = 1:n)          100   0.796    3.827
```

An interesting result to note is the gain in speed of `nth_partial_sort` over `stl_partial_sort`. In this case, for the given data, it is faster to use the combination of`std::nth_element` and `std::sort` rather than `std::partial_sort` to sort the first `n` elements of a vector.

``````// [[Rcpp::export]]
NumericVector stl_nth_element(NumericVector x, int n) {
NumericVector y = clone(x);
std::nth_element(y.begin(), y.begin()+n, y.end());
return y;
}
``````

Finally, consider a problem where you only need a single element of a sorted vector. The function `std::nth_element` from the C++ STL does just this. An example of this type of problem is computing the median of a given vector.

For an equivalent implementation in R, we can use the `sort` function by specifying a scalar value for the argument partial (i.e. `partial=n`).

``````# check that the nth sorted elements of the vectors are equal
stopifnot(all.equal(stl_nth_element(x, 43), sort(x, partial=43)))

# benchmark nth_element and sort
benchmark(stl_nth_element(z, n),
sort(z, partial=n),
order="relative")[,1:4]
``````
```                   test replications elapsed relative
1 stl_nth_element(z, n)          100   0.089    1.000
2  sort(z, partial = n)          100   0.238    2.674
```

While these are not huge speed improvements over the base R sort function, this post demonstrates how to easily access sorting functions in the C++ STL and is a good exercise to better understand the differences and performance of the sorting algorithms available in C++ and R. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...