Shuffling Columns With data.table

[This article was first published on tshafer.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Yesterday, in a post syndicated to R-bloggers, kjytay asked about how to programmatically shuffle a data.table column in place, as the straightforward way didn’t work well.

Here are two other ways to solve the same problem, one using data.table::set() and the other .SDcols:

scramble_set <- function(input_dt, colname) {
  set(input_dt, j = colname, value = sample(input_dt[[colname]]))
}

scramble_sd <- function(input_dt, colname) {
  input_dt[, c(colname) := .SD[sample(.I, .N)], .SDcols = colname]
}

Each approach returns the correct result and avoids the strange dispatch problem when trying to shuffle a column named “colname”.

It’s good to check performance with these kinds of things, too, especially when .SD is involved, and set() handily outperforms the other two solutions (kjytay’s original solution I named “orig”):

microbenchmark(
  orig = scramble_orig(input_dt, "x"),
  set = scramble_set(input_dt, "x"),
  sd = scramble_sd(input_dt, "x"), 
  setup = {
    input_dt <- data.table(x = 1:5)
    set.seed(1)
  }, 
  check = "identical"
)

Unit: microseconds
 expr     min       lq      mean  median       uq      max neval
 orig 291.970 315.4400 351.52132 319.474 327.5635 3248.663   100
  set  33.196  36.0965  61.62936  37.262  39.5380 2419.880   100
   sd 557.834 591.2370 636.88657 597.579 616.2675 3821.737   100

This post is kindly republished by R-bloggers.

To leave a comment for the author, please follow the link and comment on their blog: tshafer.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)