Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For a piece of work I had to calculate the number of matches that a team plays away from home in a row, which we will call days_on_the_road. I was not sure how to do this with dplyr but it’s basically a ‘grouped sequence’. For this post, I’ve created some dummy data to illustrate this idea. The num_matches_away variable is what we want to mimic using some data manipulation.

library(tidyverse)

sports_df <- tibble::tribble(
~team,    ~date,        ~home_or_away, ~num_matches_away,
"Team A", "07/10/2022", "H",           0L,
"Team A", "14/10/2022", "A",           1L,
"Team A", "21/10/2022", "A",           2L,
"Team A", "28/10/2022", "H",           0L,
"Team A", "04/11/2022", "A",           1L,
"Team A", "11/11/2022", "A",           2L,
"Team A", "18/11/2022", "A",           3L,
"Team A", "25/11/2022", "H",           0L,
"Team A", "02/12/2022", "H",           0L,
"Team A", "09/12/2022", "H",           0L,
"Team B", "07/10/2022", "H",           0L,
"Team B", "14/10/2022", "A",           1L,
"Team B", "21/10/2022", "A",           2L,
"Team B", "28/10/2022", "A",           3L,
"Team B", "04/11/2022", "H",           0L,
"Team B", "11/11/2022", "A",           1L,
"Team B", "18/11/2022", "A",           2L,
"Team B", "25/11/2022", "H",           0L,
"Team B", "02/12/2022", "H",           0L,
"Team B", "09/12/2022", "A",           1L
) %>%
mutate(date = as.Date(date, "%d/%m/%Y")) %>%
arrange(team, date)

sports_df
## # A tibble: 20 × 4
##    team   date       home_or_away num_matches_away
##    <chr>  <date>     <chr>                   <int>
##  1 Team A 2022-10-07 H                           0
##  2 Team A 2022-10-14 A                           1
##  3 Team A 2022-10-21 A                           2
##  4 Team A 2022-10-28 H                           0
##  5 Team A 2022-11-04 A                           1
##  6 Team A 2022-11-11 A                           2
##  7 Team A 2022-11-18 A                           3
##  8 Team A 2022-11-25 H                           0
##  9 Team A 2022-12-02 H                           0
## 10 Team A 2022-12-09 H                           0
## 11 Team B 2022-10-07 H                           0
## 12 Team B 2022-10-14 A                           1
## 13 Team B 2022-10-21 A                           2
## 14 Team B 2022-10-28 A                           3
## 15 Team B 2022-11-04 H                           0
## 16 Team B 2022-11-11 A                           1
## 17 Team B 2022-11-18 A                           2
## 18 Team B 2022-11-25 H                           0
## 19 Team B 2022-12-02 H                           0
## 20 Team B 2022-12-09 A                           1

I firstly came up with a complicated solution for this using a combination of slider::slide() and rle() (run length encoding).

sports_df2 <- sports_df %>%
group_by(team) %>%
ungroup()

## [1] TRUE

These are some pretty cool functions and it was nice to know rle() existed but I was never really happy with this solution as it seemed overly complex and it’s difficult to understand what the code is doing by simply reading it. So I asked a colleague to try to solve this problem and they came up with a better solution which I’m grateful for! It involves using a combination of group_by() and seq_len() which is a whole lot simpler to understand in my opinion.

sports_df3 <- sports_df2 %>%
mutate(away = cumsum(home_or_away == "H")) %>%
group_by(team, home_or_away, away) %>%
ungroup() %>%