Is This How You Dplyr?

[This article was first published on Jeffrey Horner, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Yesterday I ran into a fairly complex issue regarding dplyr mutation and I wanted to get your take on my solution.

I have two data frames with the same identifiers and two different date columns which I need to merge into one date column, with the value of the earlier of the two dates if both are present, or any valid date when one or the other is present, or just NA when no date is present (kinda sad when you can’t get a date :).

library(wakefield)
library(tidyr)
library(dplyr)

x <- r_data_frame(n=10,id,date_stamp(name='foo',random=TRUE))
y <- r_data_frame(n=10,id,date_stamp(name='bar',random=TRUE))

x$foo[base::sample(10,5)] <- NA
y$bar[base::sample(10,5)] <- NA

First Attempt: Just Use Min

full_join(x,y,by='ID') %>% mutate(start=min(foo,bar))

## Source: local data frame [10 x 4]
## 
##    ID        foo        bar start
## 1  01       <NA>       <NA>  <NA>
## 2  02 2014-08-27 2015-04-27  <NA>
## 3  03 2014-07-27       <NA>  <NA>
## 4  04       <NA> 2015-02-27  <NA>
## 5  05       <NA> 2015-02-27  <NA>
## 6  06 2014-09-27       <NA>  <NA>
## 7  07 2014-09-27 2014-09-27  <NA>
## 8  08       <NA> 2015-02-27  <NA>
## 9  09 2014-07-27       <NA>  <NA>
## 10 10       <NA>       <NA>  <NA>

Nope.

Second Attempt: Min With Rowwise

full_join(x,y,by='ID') %>% rowwise() %>% mutate(start=min(foo,bar))

## Source: local data frame [10 x 4]
## Groups: <by row>
## 
##    ID        foo        bar start
## 1  01       <NA>       <NA>    NA
## 2  02 2014-08-27 2015-04-27 16309
## 3  03 2014-07-27       <NA>    NA
## 4  04       <NA> 2015-02-27    NA
## 5  05       <NA> 2015-02-27    NA
## 6  06 2014-09-27       <NA>    NA
## 7  07 2014-09-27 2014-09-27 16340
## 8  08       <NA> 2015-02-27    NA
## 9  09 2014-07-27       <NA>    NA
## 10 10       <NA>       <NA>    NA

Umm. It looks like it works when both dates are present but not when one is NA.

Third Attempt: Min With na.rm=TRUE And Rowwise

full_join(x,y,by='ID') %>% rowwise() %>% mutate(start=min(foo,bar,na.rm=TRUE))

## Warning in min(NA_real_, NA_real_, na.rm = TRUE): no non-missing arguments
## to min; returning Inf

## Warning in min(NA_real_, NA_real_, na.rm = TRUE): no non-missing arguments
## to min; returning Inf

## Source: local data frame [10 x 4]
## Groups: <by row>
## 
##    ID        foo        bar start
## 1  01       <NA>       <NA>   Inf
## 2  02 2014-08-27 2015-04-27 16309
## 3  03 2014-07-27       <NA> 16278
## 4  04       <NA> 2015-02-27 16493
## 5  05       <NA> 2015-02-27 16493
## 6  06 2014-09-27       <NA> 16340
## 7  07 2014-09-27 2014-09-27 16340
## 8  08       <NA> 2015-02-27 16493
## 9  09 2014-07-27       <NA> 16278
## 10 10       <NA>       <NA>   Inf

Wow, this output reads: WARNING you are hurting dplyr’s head!

Final Solution: Custom Function With Class Fiddling

date_min <- function(x,y){
  if (!is.na(x)){
    if (!is.na(y)){
      return(min(x,y))
    } else {
      return(x)
    }
  } else if (!is.na(y)){
    return(y)
  }
  return(x)
}
z <- full_join(x,y,by='ID') %>% rowwise() %>% mutate(start=date_min(foo,bar))
class(z$start) <- 'Date'
z

## Source: local data frame [10 x 4]
## Groups: <by row>
## 
##    ID        foo        bar      start
## 1  01       <NA>       <NA>       <NA>
## 2  02 2014-08-27 2015-04-27 2014-08-27
## 3  03 2014-07-27       <NA> 2014-07-27
## 4  04       <NA> 2015-02-27 2015-02-27
## 5  05       <NA> 2015-02-27 2015-02-27
## 6  06 2014-09-27       <NA> 2014-09-27
## 7  07 2014-09-27 2014-09-27 2014-09-27
## 8  08       <NA> 2015-02-27 2015-02-27
## 9  09 2014-07-27       <NA> 2014-07-27
## 10 10       <NA>       <NA>       <NA>

So, is this the right way. How would you do it?

Also, what’s the data.table approach?

To leave a comment for the author, please follow the link and comment on their blog: Jeffrey Horner.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)