No THIS Is How You Dplyr and Data.Table!

May 28, 2015
By

(This article was first published on Jeffrey Horner, and kindly contributed to R-bloggers)

So, I got some great solutions to my dplyr mutation problem to share. Just wait until you see these things!

Remember, I was having trouble reconciling two date columns into a minimum value in the presence of NA values.

Here’s the fake data again:

library(wakefield)
library(tidyr)
library(dplyr)
library(data.table)

x <- r_data_frame(n=10,id,date_stamp(name='foo',random=TRUE))
y <- r_data_frame(n=10,id,date_stamp(name='bar',random=TRUE))

x$foo[base::sample(10,5)] <- NA
y$bar[base::sample(10,5)] <- NA

Eddie Niedermeyer Solves It Perfectly with pmin

And a shout out to Mark as well for suggesting pmin and his partial solution with data.table.

full_join(x,y,by='ID') %>% mutate(start = pmin(foo, bar, na.rm = TRUE))
## Source: local data frame [10 x 4]
## 
##    ID        foo        bar      start
## 1  01                     
## 2  02 2015-01-28 2015-02-28 2015-01-28
## 3  03        2015-03-28 2015-03-28
## 4  04 2014-10-28 2014-10-28 2014-10-28
## 5  05        2014-08-28 2014-08-28
## 6  06 2015-05-28 2014-10-28 2014-10-28
## 7  07                     
## 8  08 2014-07-28        2014-07-28
## 9  09                     
## 10 10 2014-09-28        2014-09-28

But Kirill Kills It With dplyr AND data.table

Now this is a thing of beauty! A dplyr join, magrittr pipe action, and what do we see??!?
data.table syntax with old school boolean T value?

Oh man, I’m lovin’ it!

Nice one, Kirill, nice one.

full_join(x,y,by='ID') %>% data.table %>% .[, start := pmin(foo, bar, na.rm = T)] %>% print
##     ID        foo        bar      start
##  1: 01                     
##  2: 02 2015-01-28 2015-02-28 2015-01-28
##  3: 03        2015-03-28 2015-03-28
##  4: 04 2014-10-28 2014-10-28 2014-10-28
##  5: 05        2014-08-28 2014-08-28
##  6: 06 2015-05-28 2014-10-28 2014-10-28
##  7: 07                     
##  8: 08 2014-07-28        2014-07-28
##  9: 09                     
## 10: 10 2014-09-28        2014-09-28

To leave a comment for the author, please follow the link and comment on their blog: Jeffrey Horner.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)