One Weird Trick to Obtain the Maximum (or Minimum)

[This article was first published on Pachá, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Motivation

I was tagged on Twitter about the next question (translated):

How do I choose a maximum date per row? I have a data frame with 3 columns with dates, and I want to create a 4th column that has the maximum of those 3 for each row.

Dataset

First, we create a data frame with dates, let’s assume this contains the last three days where people used the bus instead of the subway.

library(dplyr)
library(stringr)

set.seed(1234)

d <- tibble(
  name = factor(c("George", "John", "Paul", "Ringo")),
  day1 = paste0("2022-05-", str_pad(sample(1:10, 4), 2, "left", "0")),
  day2 = paste0("2022-05-", sample(11:20, 4)),
  day3 = paste0("2022-05-", sample(21:30, 4))
) %>% 
  mutate_if(is.character, as.Date)

d
## # A tibble: 4 × 4
##   name   day1       day2       day3      
##   <fct>  <date>     <date>     <date>    
## 1 George 2022-05-10 2022-05-19 2022-05-22
## 2 John   2022-05-06 2022-05-15 2022-05-27
## 3 Paul   2022-05-05 2022-05-16 2022-05-26
## 4 Ringo  2022-05-04 2022-05-14 2022-05-30

Here the column for “day 3” is the maximum date per row, but the example still applies for the general case.

Without reshaping

Here we use pmax() (or pmin() for the minimum).

d %>% 
  mutate(
    max_day = pmax(day1, day2, day3)
  )
## # A tibble: 4 × 5
##   name   day1       day2       day3       max_day   
##   <fct>  <date>     <date>     <date>     <date>    
## 1 George 2022-05-10 2022-05-19 2022-05-22 2022-05-22
## 2 John   2022-05-06 2022-05-15 2022-05-27 2022-05-27
## 3 Paul   2022-05-05 2022-05-16 2022-05-26 2022-05-26
## 4 Ringo  2022-05-04 2022-05-14 2022-05-30 2022-05-30

Reshaping

What if we had more columns, for example, if we had 10 or more columns with dates. While it’s not the case for this example, it’s useful to know what to do in those situations. Here I print the result and then compute the operation for clarity, then use max() (or min() for the minimum).

library(tidyr)

d2 <- d %>% 
  gather("day", "value", -name)

d2
## # A tibble: 12 × 3
##    name   day   value     
##    <fct>  <chr> <date>    
##  1 George day1  2022-05-10
##  2 John   day1  2022-05-06
##  3 Paul   day1  2022-05-05
##  4 Ringo  day1  2022-05-04
##  5 George day2  2022-05-19
##  6 John   day2  2022-05-15
##  7 Paul   day2  2022-05-16
##  8 Ringo  day2  2022-05-14
##  9 George day3  2022-05-22
## 10 John   day3  2022-05-27
## 11 Paul   day3  2022-05-26
## 12 Ringo  day3  2022-05-30
d2 %>% 
  group_by(name) %>% 
  summarise(max_day = max(value))
## # A tibble: 4 × 2
##   name   max_day   
##   <fct>  <date>    
## 1 George 2022-05-22
## 2 John   2022-05-27
## 3 Paul   2022-05-26
## 4 Ringo  2022-05-30
To leave a comment for the author, please follow the link and comment on their blog: Pachá.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)