R Tip: Consider radix Sort

August 21, 2018
By

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R tip: consider using radix sort.

The “method = "radix"” option can greatly speed up sorting and ordering tables in R.

For a 1 million row table the speedup is already as much as 35 times (around 9.6 seconds versus 3 tenths of a second). Below is an excerpt from an experiment sorting showing default settings and showing radix sort (full code here).


timings <- microbenchmark(
  order_default = d[order(d$col_a, d$col_b, d$col_c, d$col_x), , 
                    drop = FALSE],
  order_radix = d[order(d$col_a, d$col_b, d$col_c, d$col_x,
                        method = "radix"), ,
                  drop = FALSE],
  check = my_check,
  times = 10L)

print(timings)
## Unit: milliseconds
##           expr       min        lq      mean    median        uq
##  order_default 9531.2865 9653.6827 9759.8929 9690.6702 9833.2170
##    order_radix  262.1377  263.3226  278.2547  265.1452  274.2476
##         max neval
##  10329.3520    10
##    382.2544    10
Unnamed chunk 1 1

This speedup is possible because Matt Dowle and Arun Srinivasan of the data.table team generously ported their radix sorting code into base-R! Please see help(sort) for details. So data.table is not only the best data manipulation package in R, the team actually works to improve R itself. This is what is meant by "R community" and what is needed to keep R vibrant and alive.

Edit/Note: Iñaki Úcar shared at least 2 good points in a follow-up article: if you are using factors you get radix sort for free (for technical reasons I tend to delay/disable conversion to factors), and I didn’t mention the loss of control of collation order. Because of that I am changing the article title from “R tip: Use Radix Sort” to “R Tip: Consider radix Sort”.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Most visited articles of the week

  1. Data Visualization in R vs. Python
  2. Vignette: Downloadable tables in RMarkdown with the DT package
  3. 4 great free tools that can make your R work more efficient, reproducible and robust
  4. Analysing large data on your laptop with a database and R by @ellis2013nz
  5. 'dockr': easy containerization for R
  6. 5 Ways to Subset a Data Frame in R
  7. November 2019: "Top 40" New R Packages
  8. Why to try Practical Data Science with R, 2nd Edition
  9. How to write the first for loop in R

Sponsors

RSS Jobs for R users

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)