R Tip: Consider radix Sort

August 21, 2018
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

R tip: consider using radix sort.

The “method = "radix"” option can greatly speed up sorting and ordering tables in R.

For a 1 million row table the speedup is already as much as 35 times (around 9.6 seconds versus 3 tenths of a second). Below is an excerpt from an experiment sorting showing default settings and showing radix sort (full code here).


timings <- microbenchmark(
  order_default = d[order(d$col_a, d$col_b, d$col_c, d$col_x), , 
                    drop = FALSE],
  order_radix = d[order(d$col_a, d$col_b, d$col_c, d$col_x,
                        method = "radix"), ,
                  drop = FALSE],
  check = my_check,
  times = 10L)

print(timings)
## Unit: milliseconds
##           expr       min        lq      mean    median        uq
##  order_default 9531.2865 9653.6827 9759.8929 9690.6702 9833.2170
##    order_radix  262.1377  263.3226  278.2547  265.1452  274.2476
##         max neval
##  10329.3520    10
##    382.2544    10
Unnamed chunk 1 1

This speedup is possible because Matt Dowle and Arun Srinivasan of the data.table team generously ported their radix sorting code into base-R! Please see help(sort) for details. So data.table is not only the best data manipulation package in R, the team actually works to improve R itself. This is what is meant by "R community" and what is needed to keep R vibrant and alive.

Edit/Note: Iñaki Úcar shared at least 2 good points in a follow-up article: if you are using factors you get radix sort for free (for technical reasons I tend to delay/disable conversion to factors), and I didn’t mention the loss of control of collation order. Because of that I am changing the article title from “R tip: Use Radix Sort” to “R Tip: Consider radix Sort”.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Most visited articles of the week

  1. How to write the first for loop in R
  2. R Studio Shortcuts and Tips – part 2
  3. Learning R: The Ultimate Introduction (incl. Machine Learning!)
  4. Modern Data Science with R: A review
  5. 5 Ways to Subset a Data Frame in R
  6. Part 2: Simple EDA in R with inspectdf
  7. R – Sorting a data frame by the contents of a column
  8. Using apply, sapply, lapply in R
  9. Installing R packages

Sponsors

RSS Jobs for R users

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)