TidyR Challenge: Data.Table Solution

May 19, 2015
By

(This article was first published on Jeffrey Horner, and kindly contributed to R-bloggers)

Arun Srinivasan is the man! Once he saw that his data.table solution to the TidyR Challenge had an issue, he fixed it!

His solution is below along with a quick equivalence test to my original solution, and check out this stackOverflow question for a more engaging discussion of the strengths and weaknesses of both dplyr/tidyr and data.table.

Fake Data

library(wakefield)
library(tidyr)
library(dplyr)

d <- r_data_frame(
      n=100,
      id,
      r_series(date_stamp,15,name='foo_date'),
      r_series(level,15,name='foo_supply'),
      r_series(date_stamp,10,name='bar_date'),
      r_series(level,10,name='bar_supply'),
      r_series(date_stamp,3,name='baz_date'),
      r_series(level,3,name='baz_supply')
  )

Test Function for Equivalence

# Create a true ordered data frame and drop any extraneous classes for each column
true_ordered_df <- function(x){
  x$ID <- as.character(x$ID); class(x$ID) <- 'character'
  x$med_date <- as.Date(x$med_date); class(x$med_date) <- 'Date'
  x$med_supply <- as.integer(x$med_supply); class(x$med_supply) <- 'integer'
  x$med_name <- as.character(x$med_name); class(x$med_name) <- 'character'
  x <- data.frame(
          ID=x$ID, 
          med_date=x$med_date,  
          med_supply=x$med_supply, 
          med_name=x$med_name,
          stringsAsFactors=FALSE
  )
  x <- x[with(x,order(ID,med_date,med_supply,med_name)),]
  row.names(x) <- NULL
  x
}

Data.Table Solution, thanks to Arun Srinivasan

require(data.table) # v1.9.5
dt = as.data.table(d)

pattern = c("date", "supply")
mcols = lapply(pattern, grep, names(dt), value=TRUE)
dt.m = melt(dt, id="ID", measure=mcols, variable.name="med_name", 
value.name = paste("med", pattern, sep="_"))
setattr(dt.m$med_name, 'levels', gsub("_.*$", "", mcols[[1L]]))

scripts2 <- true_ordered_df(dt.m)

My Original Solution

# foo
med_dates <- d %>% 
    select(ID,foo_date_1:foo_date_15) %>% 
    gather(med_seq, med_date, foo_date_1:foo_date_15)
med_dates$med_seq <- as.integer(sub('^foo_date_','',med_dates$med_seq))
med_supply <- d %>% 
    select(ID,foo_supply_1:foo_supply_15) %>% 
    gather(med_seq, med_supply, foo_supply_1:foo_supply_15)
med_supply$med_seq <- as.integer(sub('^foo_supply_','',med_supply$med_seq))
foo <- left_join(med_dates,med_supply, by=c('ID','med_seq')) %>% 
    select(ID,med_date,med_supply)
foo$med_name <- 'foo'

# bar
med_dates <- d %>% 
    select(ID,bar_date_1:bar_date_10) %>% 
    gather(med_seq, med_date, bar_date_1:bar_date_10)
med_dates$med_seq <- as.integer(sub('^bar_date_','',med_dates$med_seq))
med_supply <- d %>% 
    select(ID,bar_supply_1:bar_supply_10) %>% 
    gather(med_seq, med_supply, bar_supply_1:bar_supply_10)
med_supply$med_seq <- as.integer(sub('^bar_supply_','',med_supply$med_seq))
bar <- left_join(med_dates,med_supply, by=c('ID','med_seq')) %>% 
    select(ID,med_date,med_supply)
bar$med_name <- 'bar'

# baz
med_dates <- d %>% 
    select(ID,baz_date_1:baz_date_3) %>% 
    gather(med_seq, med_date, baz_date_1:baz_date_3)
med_dates$med_seq <- as.integer(sub('^baz_date_','',med_dates$med_seq))
med_supply <- d %>% 
    select(ID,baz_supply_1:baz_supply_3) %>% 
    gather(med_seq, med_supply, baz_supply_1:baz_supply_3)
med_supply$med_seq <- as.integer(sub('^baz_supply_','',med_supply$med_seq))
baz <- left_join(med_dates,med_supply, by=c('ID','med_seq')) %>% 
    select(ID,med_date,med_supply)
baz$med_name <- 'baz'

scripts <- true_ordered_df(rbind(foo,bar,baz))
all.equal(scripts,scripts2)
## [1] TRUE

Huzzah!

To leave a comment for the author, please follow the link and comment on their blog: Jeffrey Horner.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)