Rolling 5-year Sum Label Using slider

[This article was first published on klogr::blog(posts), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Problem

First blog post off the back of an issue at work which required desktop research to figure out. This content aims to better cement my learning and to share to anyone who has been afflicted with a similar issue.

For context, I was unable to download new packages and only had the slider package available. Notable packages which came up in research are zoo and TTR which could provide similar results I’m sure, but the focus here is on slider.

Whilst calculating a rolling sum took a little while to figure out, the greater block was generating a label for plotting.

Let’s begin with setting global options and getting data.

# Load packages
library(tidyverse)
library(slider)

theme_set(theme_minimal(
 base_family = 'serif', base_size = 12))

set.seed(4321)

# Get data
example_df <- bind_rows(
 tibble(group = "group_1",
  year = c(1991:1993, 1995, 2000:2005),
  to_add = sample(1:40, 10)),
 
 tibble(group = "group_2",
  year = c(2000:2005),
  to_add = sample(1:40, 6)))

Calculate Rolling Sum

Calculate the rolling sum using an reference column or index, in this instance year. Set before argument to 4 which will then include the value itself to provide a rolling 5 year window. This code is using the _int variant of the slide_index_* functions for integer but there are equivalents for double, logical, character, and data frame.

In the function(s) slide_index_* you can’t reference the column that’s being iterated on. Instead I used the slide function to generate labels for plotting.

example_df %>%
 group_by(group) %>% 
 arrange(year) %>%
 mutate(
  # Calculate rolling sum
  rolling_5_years = slide_index_int(to_add, year, sum, .before = 4),
  # Generate rolling sum labels
  rolling_5_years_label = slide(year, ~.x, .before = 4) %>%
   as.character() %>%
   unlist()
 ) %>% 
 head(5)
# A tibble: 5 × 5
# Groups:   group [1]
  group    year to_add rolling_5_years rolling_5_years_label          
  <chr>   <dbl>  <int>           <int> <chr>                          
1 group_1  1991     27              27 1991                           
2 group_1  1992     29              56 c(1991, 1992)                  
3 group_1  1993     10              66 c(1991, 1992, 1993)            
4 group_1  1995     15              81 c(1991, 1992, 1993, 1995)      
5 group_1  2000     24              24 c(1991, 1992, 1993, 1995, 2000)

Note that this function helpfully counts missing values within the sequence. See above example, 1994 is missing but for 1995 the rolling years sum of 65 includes only 1991, 1992, 1993, and 1995 to_add values.

Now to visualise!

Visualise

Important to note when using slide to make a label, when the groups starting index value mismatch you will get the below plot which does look messy.

example_df %>%
 group_by(group) %>% 
 arrange(year) %>%
 mutate(
  # Calculate rolling sum
  rolling_5_years = slide_index_int(to_add, year, sum, .before = 4),
  # Generate rolling sum labels
  rolling_5_years_label = slide(year, ~.x, .before = 4) %>%
   as.character() %>%
   unlist()
 ) %>% 
 ggplot(aes(x = rolling_5_years, 
            y = rolling_5_years_label, 
            fill = group)) +
 geom_col(position = "dodge", colour = "white") +
 labs(title = "Rolling 5 year period example",
      subtitle = "A little messy", 
      x = "", y = "") +
 scale_fill_viridis_d(alpha = 0.75, name = "") +
 theme(
  legend.position = "top",
  plot.title.position = "plot"
 ) 

You may be required to include mismatched starting index values for your use case. But if that’s not the case and you want to provide a bit more clarity to the plot - I’ve filtered to exclude any year before 2000, now both groups start at the same year / the same starting index value.

Not all rolling years have the same number of years unless you add the .complete = TRUE argument. Be sure to filter out NA’s afterwards.

I’ve coded the labels to collapse years into colon format to maximise plot space.

# Set base rolling value the same
example_rolling_sum <- example_df %>%
 filter(!year < 2000) %>% 
 group_by(group) %>% 
 arrange(year) %>%
 mutate(
  # Calculate rolling sum
  rolling_5_years = slide_index_int(to_add, year, sum, .before = 4, .complete = TRUE),
  # Generate rolling sum labels
  rolling_5_years_label = slide(as.double(year), ~.x, .before = 4, .complete = TRUE) %>%
   map(~paste(min(.), ":", max(.))) %>% 
   unlist()
 )
Warning: There were 16 warnings in `mutate()`.
The first warning was:
ℹ In argument: `rolling_5_years_label = `%>%`(...)`.
ℹ In group 1: `group = "group_1"`.
Caused by warning in `min()`:
! no non-missing arguments to min; returning Inf
ℹ Run `dplyr::last_dplyr_warnings()` to see the 15 remaining warnings.
 # Plot
 example_rolling_sum %>% 
 filter(!is.na(rolling_5_years)) %>% 
 ggplot(aes(x = rolling_5_years, 
            y = rolling_5_years_label, 
            fill = group)) +
 geom_col(position = "dodge", colour = "white") +
  labs(title = "Rolling 5 year period example", 
       subtitle = "Pretty",
       x = "", y = "") +
 scale_fill_viridis_d(alpha = 0.75, name = "") +
 theme(
  legend.position = "top",
  plot.title.position = "plot"
 ) 

Conclusion

I think slider is a great package, was I forced to use it because I couldn’t download any other package - sure. But would I use it again given the keys to CRAN, yep. Hope this tidbit about labels helped in some small way. That brings me to the end of the first self-tech-support blog.

Acknowledgements

r-bloggers.com for the reach, platform, and content

Packages and package maintainer(s):

  • slider | Davis Vaughan
  • tidyverse | Hadley Wickham
To leave a comment for the author, please follow the link and comment on their blog: klogr::blog(posts).

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)