Rafa 21 Grand Slams and gganimate

[This article was first published on R on Alan Yeung, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve been a Nadal fan for a long time – right back to the days of the pirate-pants so yeah, really a long time. In all this time, Rafa has never been ahead in the grand slam race vs his biggest rivals… but that finally changed after the 2022 Australian Open! The win there was unexpected and came out of nowhere. The final against Medvedev has to go down as one of the best comebacks ever.

It’s already been as.Date("2022-02-13") - as.Date("2022-01-30") (14 days) since he won that record 21st grand slam so I thought it has to be about time to do something to mark the achievement. Something that’s been on my list of things to learn is gganimate which is a very cool R package so I thought I’d take the opportunity here. My goal is to create an animated barplot, showing Rafa on top at the very end.

Getting the data

I started by using this data on grand slam wins from this blog post by Emily Kuehler and filtering for just the data on the big 3 male players: Nadal, Djokovic and Federer. Since the grand slam data there does not go all the way up to the 2022 Australian Open, I had to manually add that in by looking up the required information on Wikipedia and binding that to the end.

library(tidyverse)
library(readxl)

gs_df <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-09/grand_slams.csv", 
                  show_col_types = FALSE)

gs_update <- tibble::tribble(
    ~year,       ~grand_slam,            ~name, ~rolling_win_count, ~tournament_date,
    2019,     "French Open",   "Rafael Nadal",                18,     "26/05/2019",
    2019,       "Wimbledon", "Novak Djokovic",                16,     "01/07/2019",
    2019,         "US Open",   "Rafael Nadal",                19,     "26/08/2019",
    2020, "Australian Open", "Novak Djokovic",                17,     "20/01/2020",
    2020,     "French Open",   "Rafael Nadal",                20,     "27/09/2020",
    2021, "Australian Open", "Novak Djokovic",                18,     "08/02/2021",
    2021,     "French Open", "Novak Djokovic",                19,     "30/05/2021",
    2021,       "Wimbledon", "Novak Djokovic",                20,     "28/07/2021",
    2022, "Australian Open",   "Rafael Nadal",                21,     "17/01/2022"
) %>% 
    mutate(tournament_date = as.Date(tournament_date, "%d/%m/%Y"))

gs_df2 <- gs_df %>%
    filter(name %in% c("Rafael Nadal", "Novak Djokovic", "Roger Federer")) %>%
    mutate(grand_slam = str_replace_all(grand_slam, "_", " "),
           grand_slam = str_to_title(grand_slam),
           grand_slam = str_replace_all(grand_slam, "Us", "US")) %>%
    select(-gender) %>%
    bind_rows(gs_update)

gs_df2
# A tibble: 61 x 5
#     year grand_slam      name          rolling_win_count tournament_date
#    <dbl> <chr>           <chr>                     <dbl> <date>         
#  1  2003 Wimbledon       Roger Federer                 1 2003-07-14     
#  2  2004 Australian Open Roger Federer                 2 2004-01-10     
#  3  2004 Wimbledon       Roger Federer                 3 2004-07-14     
#  4  2004 US Open         Roger Federer                 4 2004-09-09     
#  5  2005 French Open     Rafael Nadal                  1 2005-06-09     
#  6  2005 Wimbledon       Roger Federer                 5 2005-07-14     
#  7  2005 US Open         Roger Federer                 6 2005-09-09     
#  8  2006 Australian Open Roger Federer                 7 2006-01-10     
#  9  2006 French Open     Rafael Nadal                  2 2006-06-09     
# 10  2006 Wimbledon       Roger Federer                 8 2006-07-14     
# ... with 51 more rows

Data processing

I had to do a bit of general data wrangling (isn’t this always the case unfortunately?) to set things up for gganimate. This is all fairly standard stuff so I’ll just show the code below but one thing to note is that for the rank order (current_rank) of the players at each time point, I sorted ascending on rolling_win_count2 rather than descending (as would seem more logical to get the ranking by most slams) because when you use ggplot2::coord_flip(), it puts the highest value (lowest rank) at the top of the graph – so essentially I set it up so that rank 3 is the best and rank 1 is the worst.

# Expand out for all combinations
gs_df3 <- gs_df2 %>% 
    arrange(tournament_date) %>% 
    mutate(year = fct_inorder(factor(year)),
           grand_slam = factor(grand_slam, 
                               levels = c("Australian Open", "French Open", 
                                          "Wimbledon", "US Open")),
           name = factor(name, levels = c("Rafael Nadal", "Novak Djokovic", 
                                          "Roger Federer"))) %>% 
    complete(year, grand_slam, name) %>% 
    replace_na(list(rolling_win_count = 0))

# Drop tournaments before first slam win or not yet played
gs_df4 <- gs_df3 %>% 
    filter(!(year == 2003 & grand_slam %in% c("Australian Open", "French Open")),
           !(year == 2022 & grand_slam %in% c("French Open", "Wimbledon", "US Open"))) %>% 
    mutate(yr_slam = paste(year, grand_slam), .before = year)

# Recalculate rolling win count
gs_df5 <- gs_df4 %>% 
    mutate(win = as.numeric(rolling_win_count > 0)) %>% 
    group_by(name) %>% 
    mutate(rolling_win_count2 = cumsum(win)) %>% 
    ungroup() 

# Set the rank for each time point
gs_df6 <- gs_df5 %>% 
    arrange(year, grand_slam, rolling_win_count2, desc(name)) %>% 
    group_by(yr_slam) %>% 
    mutate(current_rank = row_number()) %>% 
    ungroup()

select(gs_df6, yr_slam, name, rolling_win_count2, current_rank)
# A tibble: 225 x 4
#    yr_slam              name           rolling_win_count2 current_rank
#    <chr>                <fct>                       <dbl>        <int>
#  1 2003 Wimbledon       Novak Djokovic                  0            1
#  2 2003 Wimbledon       Rafael Nadal                    0            2
#  3 2003 Wimbledon       Roger Federer                   1            3
#  4 2003 US Open         Novak Djokovic                  0            1
#  5 2003 US Open         Rafael Nadal                    0            2
#  6 2003 US Open         Roger Federer                   1            3
#  7 2004 Australian Open Novak Djokovic                  0            1
#  8 2004 Australian Open Rafael Nadal                    0            2
#  9 2004 Australian Open Roger Federer                   2            3
# 10 2004 French Open     Novak Djokovic                  0            1
# ... with 215 more rows

ggimage

Next I prepared some cartoon faces for the 3 players to go on the ends of the bars and stored these on Github so they can be loaded into R with the help of the ggimage package. I won’t go into much detail on the image processing side but the online tools I used to help with this are all in the references section of this blog post.

library(ggimage)

img_rafa <- "https://raw.githubusercontent.com/alan-y/img/master/rafa2.png"
img_novak <- "https://raw.githubusercontent.com/alan-y/img/master/novak2.png"
img_roger <- "https://raw.githubusercontent.com/alan-y/img/master/roger2.png"

gs_df7 <- gs_df6 %>% 
    mutate(img_player = case_when(
        name == "Rafael Nadal" ~ img_rafa,
        name == "Novak Djokovic" ~ img_novak,
        name == "Roger Federer" ~ img_roger,
    ))

select(gs_df7, name, img_player)
# A tibble: 225 x 2
#    name           img_player                                                    
#    <fct>          <chr>                                                         
#  1 Novak Djokovic https://raw.githubusercontent.com/alan-y/img/master/novak2.png
#  2 Rafael Nadal   https://raw.githubusercontent.com/alan-y/img/master/rafa2.png 
#  3 Roger Federer  https://raw.githubusercontent.com/alan-y/img/master/roger2.png
#  4 Novak Djokovic https://raw.githubusercontent.com/alan-y/img/master/novak2.png
#  5 Rafael Nadal   https://raw.githubusercontent.com/alan-y/img/master/rafa2.png 
#  6 Roger Federer  https://raw.githubusercontent.com/alan-y/img/master/roger2.png
#  7 Novak Djokovic https://raw.githubusercontent.com/alan-y/img/master/novak2.png
#  8 Rafael Nadal   https://raw.githubusercontent.com/alan-y/img/master/rafa2.png 
#  9 Roger Federer  https://raw.githubusercontent.com/alan-y/img/master/roger2.png
# 10 Novak Djokovic https://raw.githubusercontent.com/alan-y/img/master/novak2.png
# ... with 215 more rows

gganimate

Firstly I’ll mention that for the gganimate package to work well, you’ll also need to install the gifski package. To get things ready for the animated plot in plot_df, I just had to make a couple of minor manipulations. Then everything up to the use of transition_states() in the code below is standard ggplot2 code except maybe the use of geom_image() to add the cartoon faces to the end of the bars (note that I subtract 0.5 from rolling_win_count as the function doesn’t seem to have a nudge_y argument even though it has one for nudge_x) and the use of {closest_state} in the subtitle – this tracks the variable that the animation transitions over which for me, is yr_slam, i.e. the combination of year and grand slam name. The fill colours are from scale_fill_hue() but manually picked so that the fill colour of each player’s bar matches their favourite surface.

For transition_states(), the transition_length() is the relative length of the transition and state_length() is the relative length of the pause at the states (I stole this from the help page); I set wrap = FALSE as I don’t want the last state to transition into the first when looping the animation. I am not sure how much difference the ease_aes("quadratic-in-out") makes here to be honest but that’s what I used. In general I know this function is for messing around with the effects applied to how frames/states transition into one another. If somebody can give me a good layman’s explanation of these functions, I’d be grateful if you can do so in the comments.

In the animate() function, I set nframes = 500. Some things to note here are that the default is only 50 frames so if you have more than 50 states (and I do as I have more than 50 year-slam combinations) then you need to set this to a larger number but this number should be suitably larger so the animation looks smoother. I set end_pause to 30 frames so that it pauses at the end for a little bit. Finally I applied very specific width and height as I wanted to add something to the end of the animation which happened to have these dimensions – that’s a surprise which you will see!

library(gganimate)

plot_df <- gs_df7 %>% 
    select(-rolling_win_count) %>% 
    rename(rolling_win_count = rolling_win_count2) %>% 
    mutate(yr_slam = fct_inorder(factor(yr_slam)),
           name_count = paste(name, rolling_win_count))

# Set up ggplot2 theme
theme_set(theme_minimal())
theme_update(plot.title = element_text(face = "bold", size = 18),
             plot.subtitle = element_text(size = 14),
             panel.grid.major.y = element_blank(),
             panel.grid.minor.y = element_blank(),
             panel.grid.major.x = element_line(color = "grey75"),
             panel.grid.minor.x = element_line(color = "grey75"),
             legend.position = "none",
             axis.ticks = element_blank(),
             axis.text.y =  element_blank())

barplot_slams <- ggplot(plot_df, 
                        aes(x = current_rank, y = rolling_win_count, 
                            fill = name)) +
    geom_bar(stat = "identity", width = 0.3, colour = "black") +
    geom_text(aes(label = name_count), 
              nudge_x = -0.25, nudge_y = -0.75, 
              size = 3, fontface = "bold", hjust = 0) +
    geom_image(aes(image = img_player, y = rolling_win_count - 0.5), size = 0.09) +
    scale_fill_manual(values = c("#FF7969", "#569EFF", "#00B73A")) +
    scale_y_continuous(limits = c(-0.75, 25), breaks = seq(0, 25, by = 5)) +
    coord_flip() +
    labs(title = "Men's Tennis Grand Slam Singles Championships",
         subtitle = "{closest_state}",
         x = NULL, y = NULL) +
    transition_states(yr_slam, transition_length = 3, state_length = 1, 
                      wrap = FALSE) +
    ease_aes("quadratic-in-out")

animate(barplot_slams, nframes = 500, end_pause = 30, fps = 20, 
        width = 469, height = 334,
        renderer = gifski_renderer("barplot_slams.gif")) 

So without further ado, here is the final result for your enjoyment.

To leave a comment for the author, please follow the link and comment on their blog: R on Alan Yeung.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)