# R Tip: Consider radix Sort

tip: consider using `radix`

The “`method = "radix"`

” option can *greatly* speed up sorting and ordering tables in `R`

For a 1 million row table the speedup is already as much as 35 times (around 9.6 seconds versus 3 tenths of a second). Below is an excerpt from an experiment sorting showing default settings and showing radix sort (full code here).

timings <- microbenchmark( order_default = d[order(d$col_a, d$col_b, d$col_c, d$col_x), , drop = FALSE], order_radix = d[order(d$col_a, d$col_b, d$col_c, d$col_x, method = "radix"), , drop = FALSE], check = my_check, times = 10L) print(timings)

## Unit: milliseconds ## expr min lq mean median uq ## order_default 9531.2865 9653.6827 9759.8929 9690.6702 9833.2170 ## order_radix 262.1377 263.3226 278.2547 265.1452 274.2476 ## max neval ## 10329.3520 10 ## 382.2544 10

This speedup is possible because Matt Dowle and Arun Srinivasan of the `data.table`

team generously ported their radix sorting code into base-`R`

! Please see `help(sort)`

for details. So `data.table`

is not only the best data manipulation package in `R`

, the team actually works to improve `R`

itself. This is what is meant by "`R`

community" and what is needed to keep `R`

vibrant and alive.

Edit/Note: Iñaki Úcar shared at least 2 good points in a follow-up article: if you are using factors you get `radix`

sort for free (for technical reasons I tend to delay/disable conversion to factors), and I didn’t mention the loss of control of collation order. Because of that I am changing the article title from “R tip: Use Radix Sort” to “R Tip: Consider radix Sort”.

