Visualizing Soccer with StatsBomb Data and R, Part 1: Simple xG and Pass Partner Plots!

August 20, 2019
By

[This article was first published on R by R(yo), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This will be Part 1 of what I hope to be a multi-part series of
plotting soccer event-level data with R! This is more of a tutorial blog
post rather than a deep analytical piece but I will give some context to
the examples to set the scene! I can’t give an exact number of how many
parts as I am still getting to grips with this kind of data and I feel
like I’ve only scratched the surface. You can read some of the other
stuff I’ve done, preview blog posts for the Asian
Cup
and the
Copa
America
,
along with the code to all the standalone viz I’ve done on my
soccer_ggplot GitHub
repository
.

I’ll mostly be using the Messi Data Biography data but the steps I
show below are applicable to the other data available as well! I will be
working with the free data sets so some things may differ compared
to the full data available. Also note that it is possible to create the
viz in this blog post using data from other providers of event-level
data such as Opta. The difference in code will mainly be in the data
ingestion and cleaning phases but the gist of the {ggplot2} code should
be similar.

As an example and motivation, one of the visualizations we are going to
create is shown below:

Let’s get started!

Getting the Data

A few important steps before you even start using R:

Once that’s done we can start coding!

Packages

Here’s all the packages I’ll be using (note: I like using {pacman} so I
don’t have to repeat library() a billion times):

if (!require("pacman")) {
  install.packages("pacman")
}

pacman::p_load(tidyverse, ## mainly dplyr, purrr, and tidyr
               StatsBombR, SBpitch, soccermatics,
               extrafont, ggupset, tibbletime,
               ggtext, ggrepel, glue,
               patchwork, cowplot, gtable, grid,
               magick)

## loading fonts
loadfonts(device = "win", quiet = TRUE)

After loading the
{StatsBombR} library (note:
that I already did this above but just showing it again below for
demonstrational purposes) we first want to take a look at the output of
the FreeCompetitions() function which gives you a data frame of all
the competitions available for free from StatsBomb. Do note that this
part will be different if you are a customer using the API.

library(StatsBombR)
comps <- FreeCompetitions()

glimpse(comps)

If you View() or glimpse() the data frame you’ll see that the
competition_id we need is 11 for the Lionel Messi data. We use
this to filter() the comps data frame and then call FreeMatches()
to get a data frame of the available matches. Finally pass that data
frame to StatsBombFreeEvents() to access the data, this can take a
while if you don’t have a good internet connection!

messi_matches_raw <- comps %>% 
  filter(competition_id == 11) %>% 
  FreeMatches()

messi_data_raw <- StatsBombFreeEvents(MatchesDF = messi_matches_raw)

Clean All and Add Season Labels

Now that we’ve got the raw data we can clean it and add some extra
information using the allclean() function. This function takes care
of:

  • cleanlocations(): cleans the location variables in the data
  • Goalkeeper: Add goalkeeper data from the freeze frame
  • Shot: Adds more shot information
  • Freeze frame: Extracts info from freeze frames, i.e. density
  • Defensive: Defensive information

We can also add in the actual season names by joining with the “comps”
data frame and joining it by the “season_id”.

messi_data_clean <- messi_data_raw %>% 
  allclean() %>%  
  left_join(comps %>% select(season_id, season_name), by = "season_id")

The player names in the data are the full names and for lots of
Spanish/Portuguese players in the data that means their FULL names.
To make the names shorter and so that labels on plots can be more
legible it’s a good idea to clean the “name” variables up a bit. There
is a function, JoinPlayerNickName() that allows you to do that,
however, you need a username and password for the StatsBomb API, which I
don’t have sooo… I have several options:

  • Manually clean the names…
  • Find a nice list of player names and left_join() after cleaning
    • Example: Use transfermarkt data
  • Use the {fuzzyjoin} package:
    Join a name even if there are n number of differences

In the end I just did it manually… around 10 full minutes of hard
concentration and it was done. Added bonus is that now I am intimately
familiar with the full names of every Barcelona player in the past
decade!

messi_data_clean <- messi_data_clean %>% 
  ## player name
  mutate(player.name = case_when(
    player.name == "Oleguer Presas Renom" ~ "Oleguer",
    player.name == "Xavier Hernández Creus" ~ "Xavi",
    player.name == "Carles Puyol i Saforcada" ~ "Carles Puyol",
    player.name == "Anderson Luís de Souza" ~ "Deco",
    player.name == "Rafael Márquez Álvarez" ~ "Rafa Márquez",
    player.name == "Giovanni van Bronckhorst" ~ "Gio v.Bronckhorst",
    player.name == "Samuel Eto'o Fils" ~ "Samuel Eto'o",
    player.name == "Víctor Valdés Arribas" ~ "Víctor Valdés",
    player.name == "Juliano Haus Belletti" ~ "Juliano Belletti",
    player.name == "Ludovic Giuly" ~ "Ludovic Giuly",
    player.name == "Andrés Iniesta Luján" ~ "Andrés Iniesta",
    player.name == "Ronaldo de Assis Moreira" ~ "Ronaldinho",
    player.name == "Lionel Andrés Messi Cuccittini" ~ "Lionel Messi",
    player.name == "Fernando Navarro i Corbacho" ~ "Fernando Navarro",
    player.name == "Sylvio Mendes Campos Junior" ~ "Sylvinho",
    player.name == "Damià Abella Pérez" ~ "Damià",
    player.name == "Rubén Iván Martínez Andrade" ~ "Ronaldinho",
    player.name == "Ronaldo de Assis Moreira" ~ "Rubén",
    player.name == "Thiago Motta" ~ "Thiago Motta",
    player.name == "Mark van Bommel" ~ "Mark van Bommel",
    player.name == "Henrik Larsson" ~ "Henrik Larsson",
    player.name == "José Edmílson Gomes de Moraes" ~ "Edmílson",
    player.name == "Gabriel Francisco García de la Torre" ~ "Gabri",
    player.name == "Santiago Ezquerro Marín" ~ "Santi Ezquerro",
    player.name == "Maximiliano Gastón López" ~ "Maxi López",
    player.name == "Gianluca Zambrotta" ~ "Gianluca Zambrotta",
    player.name == "Eiður Smári Guðjohnsen" ~ "Eiður Guðjohnsen",
    player.name == "Lilian Thuram" ~ "Lilian Thuram",
    player.name == "Javier Pedro Saviola Fernández" ~ "Javier Saviola",
    player.name == "Gnégnéri Yaya Touré" ~ "Yaya Touré",
    player.name == "Bojan Krkíc Pérez" ~ "Bojan",
    player.name == "Eric-Sylvain Bilal Abidal" ~ "Eric Abidal",
    player.name == "Gabriel Alejandro Milito" ~ "Gabriel Milito",
    player.name == "Giovani dos Santos Ramírez" ~ "Giovani dos Santos",
    player.name == "Víctor Vázquez Solsona" ~ "Víctor Vázquez",
    player.name == "Thierry Henry" ~ "Thierry Henry",
    player.name == "José Manuel Pinto Colorado" ~ "José Manuel Pinto",
    player.name == "Daniel Alves da Silva" ~ "Dani Alves",
    player.name == "Sergio Busquets i Burgos" ~ "Sergio Busquets",
    player.name == "Seydou Kéita" ~ "Seydou Kéita",
    player.name == "José Martín Cáceres Silva" ~ "Martín Cáceres",
    player.name == "Gerard Piqué Bernabéu" ~ "Gerard Piqué",
    player.name == "Aliaksandr Hleb" ~ "Aliaksandr Hleb",
    player.name == "Pedro Eliezer Rodríguez Ledesma" ~ "Pedro",
    player.name == "Sergio Rodríguez García" ~ "Rodri",
    player.name == "Rafael Romero Serrano" ~ "Fali",
    player.name == "José Manuel Rueda Sampedro" ~ "José Manuel Rueda",
    player.name == "Zlatan Ibrahimovic" ~ "Zlatan Ibrahimovic",
    player.name == "Dmytro Chygrynskiy" ~ "Dmytro Chygrynskiy",
    player.name == "Maxwell Scherrer Cabelino Andrade" ~ "Maxwell",
    player.name == "Jeffren Isaac Suárez Bermúdez" ~ "Jeffren",
    player.name == "Víctor Sánchez Mata" ~ "Víctor Sánchez",
    player.name == "Thiago Alcântara do Nascimento" ~ "Thiago Alcântara",
    player.name == "David Villa Sánchez" ~ "David Villa",
    player.name == "Javier Alejandro Mascherano" ~ "Javier Mascherano",
    player.name == "Andreu Fontàs Prat" ~ "Andreu Fontàs",
    player.name == "Ibrahim Afellay" ~ "Ibrahim Afellay",
    player.name == "Manuel Agudo Durán" ~ "Nolito",
    player.name == "Marc Bartra Aregall" ~ "Marc Bartra",
    player.name == "Adriano Correia Claro" ~ "Adriano",
    player.name == "Martín Montoya Torralbo" ~ "Martín Montoya",
    player.name == "Jonathan dos Santos Ramírez" ~ "Jonathan dos Santos",
    player.name == "Francesc Fàbregas i Soler" ~ "Cesc Fàbregas",
    player.name == "Alexis Alejandro Sánchez Sánchez" ~ "Alexis Sánchez",
    player.name == "Juan Isaac Cuenca López" ~ "Isaac Cuenca",
    player.name == "Gerard Deulofeu Lázaro" ~ "Gerard Deulofeu",
    player.name == "Cristian Tello" ~ "Cristian Tello",
    player.name == "Sergi Roberto Carnicer" ~ "Sergi Roberto",
    player.name == "Marc Muniesa Martínez" ~ "Marc Muniesa",
    TRUE ~ player.name
  )) %>% 
  ## pass.recipient.name
  mutate(pass.recipient.name = case_when(
    pass.recipient.name == "Oleguer Presas Renom" ~ "Oleguer",
    pass.recipient.name == "Xavier Hernández Creus" ~ "Xavi",
    pass.recipient.name == "Carles Puyol i Saforcada" ~ "Carles Puyol",
    pass.recipient.name == "Anderson Luís de Souza" ~ "Deco",
    pass.recipient.name == "Rafael Márquez Álvarez" ~ "Rafa Márquez",
    pass.recipient.name == "Giovanni van Bronckhorst" ~ "Gio v.Bronckhorst",
    pass.recipient.name == "Samuel Eto'o Fils" ~ "Samuel Eto'o",
    pass.recipient.name == "Víctor Valdés Arribas" ~ "Víctor Valdés",
    pass.recipient.name == "Juliano Haus Belletti" ~ "Juliano Belletti",
    pass.recipient.name == "Ludovic Giuly" ~ "Ludovic Giuly",
    pass.recipient.name == "Andrés Iniesta Luján" ~ "Andrés Iniesta",
    pass.recipient.name == "Ronaldo de Assis Moreira" ~ "Ronaldinho",
    pass.recipient.name == "Lionel Andrés Messi Cuccittini" ~ "Lionel Messi",
    pass.recipient.name == "Fernando Navarro i Corbacho" ~ "Fernando Navarro",
    pass.recipient.name == "Sylvio Mendes Campos Junior" ~ "Sylvinho",
    pass.recipient.name == "Damià Abella Pérez" ~ "Damià",
    pass.recipient.name == "Rubén Iván Martínez Andrade" ~ "Ronaldinho",
    pass.recipient.name == "Ronaldo de Assis Moreira" ~ "Rubén",
    pass.recipient.name == "Thiago Motta" ~ "Thiago Motta",
    pass.recipient.name == "Mark van Bommel" ~ "Mark van Bommel",
    pass.recipient.name == "Henrik Larsson" ~ "Henrik Larsson",
    pass.recipient.name == "José Edmílson Gomes de Moraes" ~ "Edmílson",
    pass.recipient.name == "Gabriel Francisco García de la Torre" ~ "Gabri",
    pass.recipient.name == "Santiago Ezquerro Marín" ~ "Santi Ezquerro",
    pass.recipient.name == "Maximiliano Gastón López" ~ "Maxi López",
    pass.recipient.name == "Gianluca Zambrotta" ~ "Gianluca Zambrotta",
    pass.recipient.name == "Eiður Smári Guðjohnsen" ~ "Eiður Guðjohnsen",
    pass.recipient.name == "Lilian Thuram" ~ "Lilian Thuram",
    pass.recipient.name == "Javier Pedro Saviola Fernández" ~ "Javier Saviola",
    pass.recipient.name == "Gnégnéri Yaya Touré" ~ "Yaya Touré",
    pass.recipient.name == "Bojan Krkíc Pérez" ~ "Bojan",
    pass.recipient.name == "Eric-Sylvain Bilal Abidal" ~ "Eric Abidal",
    pass.recipient.name == "Gabriel Alejandro Milito" ~ "Gabriel Milito",
    pass.recipient.name == "Giovani dos Santos Ramírez" ~ "Giovani dos Santos",
    pass.recipient.name == "Víctor Vázquez Solsona" ~ "Víctor Vázquez",
    pass.recipient.name == "Thierry Henry" ~ "Thierry Henry",
    pass.recipient.name == "José Manuel Pinto Colorado" ~ "José Manuel Pinto",
    pass.recipient.name == "Daniel Alves da Silva" ~ "Dani Alves",
    pass.recipient.name == "Sergio Busquets i Burgos" ~ "Sergio Busquets",
    pass.recipient.name == "Seydou Kéita" ~ "Seydou Kéita",
    pass.recipient.name == "José Martín Cáceres Silva" ~ "Martín Cáceres",
    pass.recipient.name == "Gerard Piqué Bernabéu" ~ "Gerard Piqué",
    pass.recipient.name == "Aliaksandr Hleb" ~ "Aliaksandr Hleb",
    pass.recipient.name == "Pedro Eliezer Rodríguez Ledesma" ~ "Pedro",
    pass.recipient.name == "Sergio Rodríguez García" ~ "Rodri",
    pass.recipient.name == "Rafael Romero Serrano" ~ "Fali",
    pass.recipient.name == "José Manuel Rueda Sampedro" ~ "José Manuel Rueda",
    pass.recipient.name == "Zlatan Ibrahimovic" ~ "Zlatan Ibrahimovic",
    pass.recipient.name == "Dmytro Chygrynskiy" ~ "Dmytro Chygrynskiy",
    pass.recipient.name == "Maxwell Scherrer Cabelino Andrade" ~ "Maxwell",
    pass.recipient.name == "Jeffren Isaac Suárez Bermúdez" ~ "Jeffren",
    pass.recipient.name == "Víctor Sánchez Mata" ~ "Víctor Sánchez",
    pass.recipient.name == "Thiago Alcântara do Nascimento" ~ "Thiago Alcântara",
    pass.recipient.name == "David Villa Sánchez" ~ "David Villa",
    pass.recipient.name == "Javier Alejandro Mascherano" ~ "Javier Mascherano",
    pass.recipient.name == "Andreu Fontàs Prat" ~ "Andreu Fontàs",
    pass.recipient.name == "Ibrahim Afellay" ~ "Ibrahim Afellay",
    pass.recipient.name == "Manuel Agudo Durán" ~ "Nolito",
    pass.recipient.name == "Marc Bartra Aregall" ~ "Marc Bartra",
    pass.recipient.name == "Adriano Correia Claro" ~ "Adriano",
    pass.recipient.name == "Martín Montoya Torralbo" ~ "Martín Montoya",
    pass.recipient.name == "Jonathan dos Santos Ramírez" ~ "Jonathan dos Santos",
    pass.recipient.name == "Francesc Fàbregas i Soler" ~ "Cesc Fàbregas",
    pass.recipient.name == "Alexis Alejandro Sánchez Sánchez" ~ "Alexis Sánchez",
    pass.recipient.name == "Juan Isaac Cuenca López" ~ "Isaac Cuenca",
    pass.recipient.name == "Gerard Deulofeu Lázaro" ~ "Gerard Deulofeu",
    pass.recipient.name == "Cristian Tello" ~ "Cristian Tello",
    pass.recipient.name == "Sergi Roberto Carnicer" ~ "Sergi Roberto",
    pass.recipient.name == "Marc Muniesa Martínez" ~ "Marc Muniesa",
    TRUE ~ pass.recipient.name
  ))

I only changed it for these two variables but you could do it for more
using the scoped variants of mutate() such as mutate_at() or
mutate_if() to change the values of variables that adhere to certain
conditions.

Save Cleaned Data

Now that we’ve got a clean data set it might be a good idea to save it.
I use the here::here() function for setting the path root to the
top-level of the current project directory and then jumping into the
“data” folder. Read this blog post
here
for more info on why it’s useful to do so.

saveRDS(messi_data_clean, file = here::here("data/messi_data_clean.RDS"))

To get data for the other data sets it’s a matter of finding and
filtering for the correct “competition_id”. For the Women’s World Cup
data that’ll be 72 and for the Men’s World Cup last year it’ll be
43. The other data cleaning steps are the same.

With a nice clean data set ready, we can move on to reshaping the data
for analysis and plotting!

xG Timeline

Data

To get the data for a single match, in this case an “El Clasico”
match from the 2011/2012 season, we filter() for its “match_id”
number. Our main statistic of interest for the next two plots is going
to be xG in the “shot.statsbomb_xg” variable. If the value for it is
NA we can safely set the value to 0, otherwise we just keep the
value for that row.

We also create a separate data set that sums up the total xG for both
teams and creates a nice label using the {glue} package. The
“team_label” variable will come in handy in the plots. After joining
that data frame in, we also create a “player_label” variable to store
the “player.name” and “shot.statsbomb_xg” values for rows where a
Goal was scored. This variable will also be used as labels in the
plots.

clasico_1112 <- messi_data_clean %>% 
  filter(match_id == 69334) %>% 
  mutate(shot.statsbomb_xg = if_else(is.na(shot.statsbomb_xg), 
                                     0, shot.statsbomb_xg))

clasico_1112_xg <- clasico_1112 %>% 
  group_by(team.name) %>% 
  summarize(tot_xg = sum(shot.statsbomb_xg) %>% signif(digits = 2)) %>% 
  mutate(team_label = glue::glue("{team.name}: {tot_xg} xG"))

clasico_1112 <- clasico_1112 %>% 
  left_join(clasico_1112_xg, by = "team.name") %>% 
  mutate(player_label = case_when(
    shot.outcome.name == "Goal" ~ glue::glue("{player.name}: {shot.statsbomb_xg %>% signif(digits = 2)} xG"),
    TRUE ~ ""))

Plot

There’s several components to this plot. First, there is a timeline
going across the plot showing the total minutes of the game, this is
done with geom_segment() and setting the x and xend to 0 and
95 respectively while the y and yend arguments are kept to zero
as there shouldn’t be any movement along the y-axis. Second, there are
green segments highlighting when an actual goal was scored in the game,
done via the geom_rect() function and passing data where the
“shot.outcome.name” variable had the value, Goal. I added a small
two minute buffer on either side of the goal time to create a
rectangular highlight. Last but certainly not least, are the
geom_point()s of different sizes (depending on the value of XG)
showing the xG events throughout the match.

If you’ve worked with fonts and {ggplot2} before you might think it’s
weird that I’m calling windowsFonts() below. Normally I wouldn’t, but
I can’t seem to get the fonts to show up properly when I stitch multiple
plots together (in a later section) so I had to resort to doing it this
way. If you want to just create a standalone plot then the
windowsFonts() code isn’t needed and you can call the font family in
theme() as you would normally (after doing the {extrafont} stuff at
the beginning). This is something peculiar with fonts and certain
Operating Systems and you may experience different problems or none at
all on your computer.

windowsFonts(robotoc = windowsFont("Roboto Condensed"))

clasico_xg_timelineplot <- clasico_1112 %>% 
  ggplot() +
  geom_segment(x = 0, xend = 95,
               y = 0, yend = 0) +
  geom_rect(data = clasico_1112 %>% filter(shot.outcome.name == "Goal"),
            aes(xmin = minute - 2, xmax = minute + 2,
                ymin = -0.005, ymax = 0.005), 
            alpha = 0.3, fill = "green") +
  geom_label_repel(data = clasico_1112 %>% filter(shot.outcome.name == "Goal"),
             aes(x = minute, y = 0,
                 color = team.name, label = player_label), 
             nudge_x = 4, nudge_y = 0.003, family = "robotoc",
             show.legend = FALSE) +
  geom_point(data = clasico_1112 %>% filter(shot.statsbomb_xg != 0),
             shape = 21, stroke = 1.5,
             aes(x = minute, y = 0, 
                 size = shot.statsbomb_xg, fill = team.name)) +
  scale_color_manual(values = c("Barcelona" = "#a50044",
                                "Real Madrid" = "black")) +
  scale_fill_manual(values = c("Barcelona" = "#a50044",
                                "Real Madrid" = "white")) +
  facet_wrap(vars(team_label), ncol = 1) +
  scale_x_continuous(breaks = seq(0, 95, by = 5),
                     labels = c(seq(0, 40, by = 5), "HT", 
                                seq(50, 90, by = 5), "FT"),
                     limits = c(-3, 95),
                     expand = c(0.01, 0)) +
  scale_y_continuous(limits = c(-0.005, 0.005),
                     expand = c(0, 0)) +
  scale_size(range = c(2, 6)) +
  labs(caption = "By @R_by_Ryo") +
  theme_minimal() +
  theme(legend.position = "none",
        strip.text = element_text(size = 16, family = "robotoc", 
                                  face = "bold", color = "grey20"),
        plot.caption = element_text(family = "robotoc", color = "grey20",
                                    hjust = 0),
        axis.title = element_blank(),
        axis.text = element_blank(),
        panel.grid.minor = element_blank(),
        panel.grid.major.y = element_blank())
  
clasico_xg_timelineplot

Alone and without x-axis labels it doesn’t amount to much but in
combination with the next two plots it’ll all come together nicely.

xG Accumulated Plot

While the previous plot highlights certain xG events throughout the
match it doesn’t give you a sense of the ebb and flow of the game
through the lens of xG. This next plot adds up the total xG of teams
over time which can show periods of dominance and the spread of high/low
xG shots across a match.

Data

Similar to the previous plot except we’re just taking the cumulative sum
over time using the cumsum() function. We put a lag() on it so that
both teams start off with 0 xG at minute 0. To help with the labels for
our plot we left_join() the same data frame except only for rows where
there was a goal. Then we create slightly different versions of the
minutes (“minute_goal”) and rollsum (“rollsum_goal”) variables for the
goals so they line up properly on the plot. For the actual label we use
glue::glue() to glue together the values of the “player.name” variable
and the “sumxg” variable (only for rows where the shot outcome is equal
to Goal).

clasico_rollsum <- clasico_1112 %>% 
  group_by(minute, team.name, period) %>% 
  summarize(sumxg = sum(shot.statsbomb_xg)) %>% 
  ungroup() %>% 
  group_by(team.name) %>% 
  mutate(rollsum = lag(cumsum(sumxg)),
         rollsum = if_else(is.na(rollsum), 0, rollsum)) %>% 
  select(team.name, minute, rollsum, sumxg) %>%
  mutate(rollsum = case_when(
    row_number() == n() & sumxg != 0 ~ rollsum + sumxg,
    TRUE ~ rollsum
  ))

clasico_rollsum <- clasico_rollsum %>% 
  left_join(clasico_1112 %>% filter(shot.outcome.name == "Goal") %>% select(minute, shot.outcome.name, team.name, player.name), 
            by = c("minute", "team.name")) %>% 
  mutate(rollsum_goal = rollsum + sumxg,
         minute_goal = minute + 1,
         player_label = case_when(
           shot.outcome.name == "Goal" ~ glue::glue("{player.name}: {sumxg %>% signif(digits = 2)} xG"),
           TRUE ~ ""))

glimpse(clasico_rollsum)
## Observations: 189
## Variables: 9
## Groups: team.name [2]
## $ team.name          "Barcelona", "Real Madrid", "Barcelona", "Re...
## $ minute             0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7,...
## $ rollsum            0.00000000, 0.00000000, 0.00000000, 0.585871...
## $ sumxg              0.00000000, 0.58587144, 0.00000000, 0.000000...
## $ shot.outcome.name  NA, "Goal", NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ player.name        NA, "Karim Benzema", NA, NA, NA, NA, NA, NA,...
## $ rollsum_goal       0.00000000, 0.58587144, 0.00000000, 0.585871...
## $ minute_goal        1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8,...
## $ player_label       "", "Karim Benzema: 0.59 xG", "", "", "", ""...

Plot

If you’re familiar with R, this is a simple line plot. However, there’s
still a lot of work to be done to make it look nice. By setting the
breaks, labels, and limits in scale_x_continous() we can properly
label the time along the x-axis. For the y-axis, I use the sec_axis()
function to attach labels for each team’s total xG at the end of the
line, more specifically the y-axis on the opposite side. To save on some
space we can also move the legend into a more accessible place in the
plot by specifying the coordinates in the legend.position argument of
theme(). We can also use the new
{ggtext} package to add more
styling to the text in the labels and titles using CSS/HTML. Using
geom_point() and geom_label_repel() we can add markers to signify
when goals were scored along with who the goal scorer was and the xG
value of the shot.

tot_clasico_df <- clasico_1112_xg %>% 
  pull(tot_xg)

clasico_rollsumxg_plot <- clasico_rollsum %>% 
  ggplot(aes(x = minute, y = rollsum, 
             group = team.name, color = team.name)) +
  geom_line(size = 2.5) +
  geom_label_repel(data = clasico_rollsum %>% filter(shot.outcome.name == "Goal"),
             aes(x = minute_goal, y = rollsum_goal, 
                 color = team.name, label = player_label), 
             nudge_x = 6, nudge_y = 0.15, family = "Roboto Condensed",
             show.legend = FALSE) +
  geom_point(data = clasico_rollsum %>% filter(shot.outcome.name == "Goal"),
             aes(x = minute_goal, y = rollsum_goal, color = team.name), show.legend = FALSE,
             size = 5, shape = 21, fill = "white", stroke = 1.25) +
  scale_color_manual(values = c("Barcelona" = "#a50044",
                                 "Real Madrid" = "#000000"),
                     labels = c("Barcelona", 
                                "Real Madrid")) +
  scale_fill_manual(values = c("Barcelona" = "#a50044",
                               "Real Madrid" = "#000000")) +
  scale_x_continuous(breaks = c(seq(0, 90, by = 5), 94),
                     labels = c(seq(0, 40, by = 5), "HT", 
                                seq(50, 90, by = 5), "FT"),
                     expand = c(0.01, 0),
                     limits = c(0, 94)) +
  scale_y_continuous(sec.axis = sec_axis(~ ., breaks = tot_clasico_df)) +
  labs(title = "Real Madrid: 1 (1st, 40 pts.)
Barcelona: 3 (2nd, 34 pts.)"
, subtitle = "December 10, 2011 (Matchday 16)", x = NULL, y = "Expected Goals") + theme_minimal() + theme(text = element_text(family = "Roboto Condensed"), plot.title = element_markdown(size = 40, family = "Roboto Condensed"), plot.subtitle = element_text(size = 18, family = "Roboto Condensed", color = "grey20"), axis.title = element_text(size = 18, color = "grey20"), axis.text = element_text(size = 16, face = "bold"), panel.grid.minor = element_blank(), legend.text = element_markdown(size = 16), legend.position = c(0.2, 0.95), legend.direction = "horizontal", legend.title = element_blank()) clasico_rollsumxg_plot

Real Madrid had higher xG throughout the match (boosted considerably by
the first goal less than 30 seconds in which had an xG value of 0.54)
yet it was Barcelona who scored 3 goals from an xG of 0.78 to win the
game.

Final Third Passes

In this plot we look at a rolling sum (using a window of 5 minutes) of
the passes that were made by each team in the final third of the field.

Data

We group_by() each team and the minute to count the number of events
that had a value of “Pass” with the condition that they happened in the
final third of the field (“location.x” >= 80).

roll_final_pass <- clasico_1112 %>% 
  group_by(team.name, minute) %>% 
  mutate(count = case_when(
    type.name == "Pass" & location.x >= 80 ~ 1L,
    TRUE ~ 0L
  )) %>% 
  select(team.name, minute, count) %>% 
  ungroup()

The main problem here is that not every minute is included in the data
due to a variety of factors, for this game, there isn’t any “Pass” data
for the 93rd minute for either team and no pass data for Barcelona in
the entirety of the 14th minute. So even if we apply our rolling sum
function it wouldn’t be accurate as it’ll won’t be taking into account
the rows for those missing minutes. We just need to create another data
frame that has every combination of the minutes throughout the match for
each team. I use tidyr::crossing() here but there are other ways to do
this.

first_min <- clasico_1112$minute %>% unique() %>% first()
last_min <- clasico_1112$minute %>% unique() %>% last()
minute <- c(first_min:last_min)
team.name <- c("Real Madrid", "Barcelona")

crossing(minute, team.name) %>% slice(26:32)
## # A tibble: 7 x 2
##   minute team.name  
##           
## 1     12 Real Madrid
## 2     13 Barcelona  
## 3     13 Real Madrid
## 4     14 Barcelona  
## 5     14 Real Madrid
## 6     15 Barcelona  
## 7     15 Real Madrid

Now there’s a row for the missing minutes as well and now we can take
this crossed data frame and join it with the passing data frame. Then we
sum up the number of passes for each minute interval and then apply a
rolling_sum() function. This custom function is created using
tibbletime::rollify(). To use this function, specify an input function
to be used for the rolling window, in our case sum() and the window to
be of length 5. In the final line we filter() the data so we only take
the rows for each 5 minute interval and the last row (the 94th minute).

rolling_sum <- tibbletime::rollify(.f = sum, window = 5)

roll_clasico_pass <- crossing(minute, team.name) %>%
  left_join(roll_final_pass, by = c("minute", "team.name")) %>% 
  group_by(team.name, minute) %>% 
  summarize_all(sum) %>% 
  ungroup() %>% 
  mutate(count = ifelse(is.na(count), 0, count)) %>% 
  group_by(team.name) %>% 
  mutate(rollsum = rolling_sum(count),
         rollsum = ifelse(is.na(rollsum), 0, rollsum)) %>% 
  group_by(team.name) %>% 
  select(-count) %>% 
  filter(row_number() %% 5 == 1 | row_number() == n())

roll_clasico_pass %>% head(5)
## # A tibble: 5 x 3
## # Groups:   team.name [1]
##   team.name minute rollsum
##            
## 1 Barcelona      0       0
## 2 Barcelona      5       3
## 3 Barcelona     10       5
## 4 Barcelona     15       1
## 5 Barcelona     20       5

Plot

This is similar to the previous plot but with the addition of
geom_point() to add markers for the number of final third passes at
the 5 minute intervals we just created. We change the shape of the
points to 21 (a hollow circle) so that we can fill the inside with
the color of each team specified in scale_fill_manual(). We also set
the stroke to 2.5 so that the outline of the circle is a bit
thicker.

windowsFonts(robotoc = windowsFont("Roboto Condensed"))

finalthird_rollingplot <- roll_clasico_pass %>% 
  ggplot(aes(x = minute, y = rollsum, 
             group = team.name)) +
  geom_line(data = roll_clasico_pass,
            size = 1.2) +
  geom_point(data = roll_clasico_pass,
             aes(fill = team.name),
             size = 3.5, shape = 21, stroke = 2.5) +
  scale_x_continuous(breaks = seq(0, 95, by = 5),
                     labels = c(seq(0, 40, by = 5), "HT", 
                                seq(50, 90, by = 5), "FT"),
                     limits = c(-3, 95),
                     expand = c(0.01, 0)) +
  scale_y_continuous(breaks = seq(0, 30, by = 5),
                     labels = seq(0, 30, by = 5)) +
  scale_fill_manual(values = c("Barcelona" = "#a50044",
                               "Real Madrid" = "white"),
                    labels = c("Barcelona", 
                               "Real Madrid")) +
  labs(title = "Real Madrid: 1 (1st, 40 pts.)
Barcelona: 3 (2nd, 34 pts.)"
, subtitle = "December 10, 2011 (Matchday 16)", x = NULL, y = "Final Third Passes") + theme_minimal() + theme(text = element_text(family = "robotoc"), plot.title = element_markdown(size = 40, family = "robotoc"), plot.subtitle = element_text(size = 18, family = "robotoc", color = "grey20"), axis.title = element_text(size = 18, color = "grey20"), axis.text = element_text(size = 16, face = "bold"), panel.grid.minor = element_blank(), legend.text = element_markdown(size = 14), legend.position = c(0.25, 0.95), legend.direction = "horizontal", legend.title = element_blank()) finalthird_rollingplot

As a standalone plot it’s nice as you can see which team was on the
offensive at different points throughout the game. However, it might be
even more useful if we can look at this data in combination with some of
the other plots we created previously which leads us to the next
section…

All Together Now!

You can combine several of the plots we made above to create a nice
infographic that summarizes the game using this kind of data. I’m sure
you’ve seen some of these online such as
this
from Women’s Footy Stat among
others. The two packages I normally use are
{patchwork} and
{cowplot} for this kind of job
but with the {ggtext} formatting as well as how wonky fonts work on
Windows and R I had to resort to using {grid} and {gtable} to combine
the plots without the text getting messed up on rendering.

library(gtable)
library(grid)

png(filename = here::here("Lionel Messi/output/clasico_match_plot_RAW.png"), 
    width = 1000, height = 1600, res = 144, bg = "white")

one <- ggplotGrob(finalthird_rollingplot)
two <- ggplotGrob(clasico_xg_timelineplot)

gg <- rbind(one, two, size = "last")
gg$widths <- unit.pmax(one$widths, two$widths)

grid.newpage()
grid.draw(gg)
dev.off()
## png 
##   2

If you don’t want to include the {ggtext} stuff then using
cowplot::plot_grid() with the arguments align set to v for
vertical alignment, h for horizontal alignment, and axis set to
l for left margin alignment works just fine.

## ...delete all {ggtext} code and resave ggplot objects...
clasico_match_plot <- plot_grid(finalthird_rollingplot,
          clasico_xg_timelineplot, ncol = 1,
          align = "hv", axis = "l")

ggsave(plot = clasico_match_plot,
       filename = here::here("Lionel Messi/output/clasico_match_plotRAW.png"),
       height = 14, width = 10)

Nice! However, we’ve got one last thing to do which is to add the
StatsBomb logo to our plot as per their user agreement. For this I’ll
use a special function, add_logo() (mainly based on the
{magick}
package) created by Thomas Mock that
I always use for appending logos onto plots.

add_logo <- function(plot_path, logo_path, logo_position, logo_scale = 10){

    # Requires magick R Package https://github.com/ropensci/magick

    # Useful error message for logo position
    if (!logo_position %in% c("top right", "top left", "bottom right", "bottom left")) {
        stop("Error Message: Uh oh! Logo Position not recognized\n  Try: logo_positon = 'top left', 'top right', 'bottom left', or 'bottom right'")
    }

    # read in raw images
    plot <- magick::image_read(plot_path)
    logo_raw <- magick::image_read(logo_path)

    # get dimensions of plot for scaling
    plot_height <- magick::image_info(plot)$height
    plot_width <- magick::image_info(plot)$width

    # default scale to 1/10th width of plot
    # Can change with logo_scale
    logo <- magick::image_scale(logo_raw, as.character(plot_width/logo_scale))

    # Get width of logo
    logo_width <- magick::image_info(logo)$width
    logo_height <- magick::image_info(logo)$height

    # Set position of logo
    # Position starts at 0,0 at top left
    # Using 0.01 for 1% - aesthetic padding

    if (logo_position == "top right") {
        x_pos = plot_width - logo_width - 0.01 * plot_width
        y_pos = 0.01 * plot_height
    } else if (logo_position == "top left") {
        x_pos = 0.01 * plot_width
        y_pos = 0.01 * plot_height
    } else if (logo_position == "bottom right") {
        x_pos = plot_width - logo_width - 0.01 * plot_width
        y_pos = plot_height - logo_height - 0.001 * plot_height
    } else if (logo_position == "bottom left") {
        x_pos = 0.01 * plot_width
        y_pos = plot_height - logo_height - 0.01 * plot_height
    }

    # Compose the actual overlay
    magick::image_composite(plot, logo, offset = paste0("+", x_pos, "+", y_pos))
}

We input the finished plot that we just saved as well as the path to the
StatsBomb logo that I have saved in an “img” folder. We can then set the
logo_position and the logo_scale (relative to the plot) and save it
using magick::image_write().

plot_logo <- add_logo(
  plot_path = here::here("Lionel Messi/output/clasico_match_plot_RAW.png"),
  logo_path = here::here("img/stats-bomb-logo.png"),
  logo_position = "bottom right",
  logo_scale = 5)

plot_logo

## Save Plot
magick::image_write(
  image = plot_logo, 
  path = here::here("Lionel Messi/output/clasico_match_plot_FIN.png"))

With the plot done, let me give you a bit of context to this game. After
Real Madrid scored a goal within the first minute, it became a tight
game with either side not really being able to string many passes in the
final third. However, Alexis Sanchez was able to score the equalizer
against the run of play from a Messi through ball around the 30th minute
(Video of Sanchez’s
goal
), during a period
where Real Madrid had a lot of final third passes and created several
chances in quick succession (albeit of low xG values). Barca’s two later
goals came from sustained pressure of their own in the final third.
Although Barcelona were able to close the gap on Real Madrid to just 3
points there was still half a season to go and defeats to Osasuna and
Real Madrid in the return fixture proved to be their undoing.

With the StatsBomb data available and the plot-stitching R packages
shown above you can make similar plots or combine any two, three, or
even four plots to provide an overview of a match or season! You could
also add in text-only ggplot objects and combine it with the plots to
make an infographic, the possibilities are endless!

Pass Partner Plots

These next few plots explore the passing partnerships between all the
Barcelona players. Rather than a full pass network graph this is simply
looking at things from a more micro-level by counting up the frequency
in which two players exchanged passes with each other. From a
visualization standpoint there are problems with using a standard bar
chart due to the long labels needed along the x-axis. One option is to
put the player names on the y-axis however it’s not always the best to
do so. Another way is to use “upset plots” which visualizes the set
intersections by a matrix located around the main plot.

I used the {ggupset} package but
there are alternatives such as {UpsetR} which provides some additional
features. The choice is a matter of preference, as for me, I liked how
seamlessly {ggupset} worked with the existing {ggplot2} API. Before we
get to plotting we need to manipulate the data we have to that we get
the right variables to pass to the plotting functions.

Data: All Passes Received in the Box

If you check the data you’ll see that the majority of the values in the
pass.outcome.name variable are set to NA and you might think that
there’s a lot of missing data. However, the empty values are all
actually “Complete” passes. To make this more explicit we can use the
fct_explicit_na() function to set those NAs to “Complete” while also
turning the variable into a factor.

Following that we filter() for event types that are specifically
“Pass”-es that have a “Complete” outcome from the team “Barcelona” that
only come from open play and where the passes end up in the opposition’s
box. You can find out the exact coordinates to set up the filtering for
passes into the box (and any other area of the pitch you have in mind)
by taking a look at page 34 of StatsBomb’s Open Data Specification
(version
1.1)
.

From there we select() the variables we want to keep, you can use a
select helper function such as contains() to grab all variables
containing the string that you supply, in this case all of the “pass”
variables, “pass.angle”, “pass.length”, etc.

Then for each season we count the number of passes between a player
(“player.name”) and the recipient of the pass (“pass.recipient.name”)
and call this variable “pass_num”. After making sure we ungroup() we
edit the “player.name” variable so that it includes both the name and
the number of passes they made (the “pass_num” variable we just
created).

Finally, {ggupset} expects the variable that we are creating the plot
for to be in a list form. So we create a new list variable “pass_duo”
whose elements contain the passer’s name (“player.name”) and the pass
recipient’s name (“pass.recipient.name”).

pass_received_all_box <- messi_data_clean %>% 
  mutate(pass.outcome.name = fct_explicit_na(pass.outcome.name, "Complete")) %>%
  filter(type.name == "Pass",
         team.name == "Barcelona",
         pass.outcome.name == "Complete",
         ## Only passes from open play
         !play_pattern.name %in% c("From Corner", "From Free Kick",
                                   "From Throw In"),
         ## Only passes that ended up inside the box:
         pass.end_location.x >= 102 & pass.end_location.y <= 62 &
           pass.end_location.y >= 18) %>% 
  select(player.name, pass.recipient.name, 
         season_id, season_name,
         position.name, position.id,
         location.x, location.y,
         pass.end_location.x, pass.end_location.y,
         contains("pass")) %>% 
  group_by(season_name) %>% 
  add_count(player.name, pass.recipient.name, name = "pass_num") %>% 
  ungroup() %>% 
  mutate(player.name = glue::glue("{player.name}: {pass_num}")) %>% 
  mutate(pass_duo = map2(player.name, pass.recipient.name, ~c(.x, .y))) %>% 
  select(player.name, pass.recipient.name, pass_num, 
         season_name, pass_duo)

Now we can get to the actual plotting!

As we have data for multiple seasons, instead of repeating the {ggplot2}
code for every year we can create a “base plot” for every season and
store it inside the data frame via nesting. To do the nesting, you need
to group_by() the season and then call nest(). As you can see below
this creates a column called “data” which holds all the variables and
values from each of the seasons listed in “season_name”.

pass_received_all_box %>% 
  group_by(season_name) %>% 
  nest()
## # A tibble: 8 x 2
##   season_name           data
##          >
## 1 2004/2005         [46 x 4]
## 2 2005/2006         [96 x 4]
## 3 2006/2007        [127 x 4]
## 4 2007/2008        [152 x 4]
## 5 2008/2009        [249 x 4]
## 6 2009/2010        [267 x 4]
## 7 2010/2011        [296 x 4]
## 8 2011/2012        [253 x 4]

With the way the data frame is set up now, you can use mutate() to
create a new variable column containing the plots for each season! If
you want to do this, especially if you also want to programatically add
in the season name to each of the plots you need to use the
purrr::map2() function. By passing the “data” and “season_name”
variables to the function we can ensure that they can be used in the
code to create the plots. Here we’re passing the “data” as vector .x
and the “season_name” as vector .y, these notations are the ones
we’ll use to refer to these variables inside the actual {ggplot2}
function call itself.

Using the ~ to denote that the following code is the function we want
to use, we start building out plot. As can be seen the first argument
“data” is set to the .x argument that we set before as the data for a
specific season. The main code for the upset pot comes in
scale_x_upset() where you can set the number of intersections to plot,
in this case 10 for ten different passer-pass receiver pairings.
Within theme_combmatrix() you can set the usual theme elements for a
plot as well as upset matrix specific aspects such as the line and
point’s color and size as well as spacing for the text.

all_pass_nested_box <- pass_received_all_box %>% 
  group_by(season_name) %>% 
  nest() %>%
  mutate(plot = map2(
    .x = data, .y = season_name,
    ~ ggplot(data = .x, aes(x = pass_duo)) +
      geom_bar(fill = "#a70042") + 
      scale_x_upset(n_intersections = 10,
                    expand = c(0.01, 0.01)) +
      scale_y_continuous(expand = c(0.04, 0.04)) +
      labs(title = glue::glue("
                              Total Completed Passes Into The Box 
                              Between All Players ({.y})"),
           subtitle = "'Name: Number' = Passer, 'No Number' = Pass Receiver",
           x = NULL, y = "Number of Passes") +
      theme_combmatrix(
        text = element_text(family = "Roboto Condensed", 
                            color = "#004c99"),
        plot.title = element_text(family = "Roboto Condensed", size = 20,
                                  color = "#a70042"),
        plot.subtitle = element_text(family = "Roboto Condensed", size = 16,
                                     color = "#004c99"),
        axis.title = element_text(family = "Roboto Condensed", size = 14,
                                  color = "#004c99"), 
        axis.text.x = element_text(family = "Roboto Condensed", size = 12,
                                   color = "#004c99"),
        axis.text.y = element_text(family = "Roboto Condensed", size = 12,
                                   color = "#004c99"),
        panel.background = element_rect(fill = "white"),
        combmatrix.panel.point.size = 4,
        combmatrix.panel.point.color.fill = "#a70042",
        combmatrix.panel.line.color = "#a70042",
        panel.grid = element_line(color = "black"),
        panel.grid.major.x = element_blank(),
        axis.ticks = element_blank())))

glimpse(all_pass_nested_box)
## Observations: 8
## Variables: 3
## $ season_name  "2004/2005", "2005/2006", "2006/2007", "2007/2008"...
## $ data        > Ronaldinho: 5       , Ronaldinho: 5      ...
## $ plot         [

Now you can check out the 8th element of the “plot” variable which
corresponds to the 2011/2012 season:

all_pass_nested_1112 <- all_pass_nested_box$plot[[8]] +
  scale_y_continuous(labels = seq(0, 15, by = 5),
                     breaks = seq(0, 15, by = 5),
                     limits = c(0, 15))

ggsave(plot = all_pass_nested_1112,
       filename = here::here("Lionel Messi/output/allpass_1112_plotRAW.png"),
       height = 6, width = 8)

plot_logo <- add_logo(
  plot_path = here::here("Lionel Messi/output/allpass_1112_plotRAW.png"),
  logo_path = here::here("img/stats-bomb-logo.png"),
  logo_position = "top right",
  logo_scale = 5)

plot_logo

## Save Plot
magick::image_write(
  image = plot_logo, 
  path = here::here("Lionel Messi/output/allpass_1112_plotFIN.png"))

You can add in whatever other {ggplot2} functions in as needed but this
way you don’t have to type out the entire code block for each season;
you can just tweak and adjust the “base plot” we created in the “plot”
variable of the nested data frame!

As can be seen above a little more work needs to be done concerning the
axis-labels. Although Messi is labelled having made 7 passes, they are 7
passes EACH to Alexis Sanchez, Iniesta, Cristian Tello, and Dani
Alves so the total really should read as 28.

Data: Shot Assists

Just to show another example let’s look at shot assists instead. Besides
the differences inside filter() the code is the same (minus label and
title parts too of course):

## Data
messi_all_shot_assist <- messi_data_clean %>% 
  mutate(pass.outcome.name = fct_explicit_na(pass.outcome.name, "Complete")) %>%
  filter(team.name == "Barcelona",
         !is.na(pass.shot_assist),
         !play_pattern.name %in% c("From Corner", "From Free Kick",
                                   "From Throw In")) %>% 
  select(player.name, pass.recipient.name, 
         season_id, season_name,
         position.name, position.id,
         location.x, location.y,
         pass.end_location.x, pass.end_location.y,
         contains("pass")) %>% 
  group_by(season_name) %>% 
  add_count(player.name, pass.recipient.name, name = "pass_num") %>% 
  ungroup() %>% 
  mutate(player.name = glue::glue("{player.name}: {pass_num}")) %>% 
  mutate(pass_duo = map2(player.name, pass.recipient.name, ~c(.x, .y))) %>% 
  select(player.name, pass.recipient.name, pass_num, 
         season_name, pass_duo)

## Nest plots
messi_nested_all_shot_assist <- messi_all_shot_assist %>% 
  group_by(season_name) %>% 
  nest() %>%
  mutate(plot = map2(
    data, season_name,
    ~ ggplot(data = .x, aes(x = pass_duo)) +
      geom_bar(fill = "#a70042") + 
      scale_x_upset(n_intersections = 10,
                    expand = c(0.01, 0.01)) +
      scale_y_continuous(expand = c(0.04, 0.04)) +
      labs(title = glue::glue("Shot Assists ({.y})"),
           subtitle = "'Name: Number' = Passer, 'No Number' = Pass Receiver",
           caption = "Source: StatsBomb",
           x = NULL, y = "Number of Passes") +
      theme_combmatrix(
        text = element_text(family = "Roboto Condensed", 
                            color = "#004c99"),
        plot.title = element_text(family = "Roboto Condensed", size = 20,
                                  color = "#a70042"),
        plot.subtitle = element_text(family = "Roboto Condensed", size = 16,
                                     color = "#004c99"),
        axis.title = element_text(family = "Roboto Condensed", size = 14,
                                  color = "#004c99"), 
        axis.text.x = element_text(family = "Roboto Condensed", size = 12,
                                   color = "#004c99"),
        axis.text.y = element_text(family = "Roboto Condensed", size = 12,
                                   color = "#004c99"),
        panel.background = element_rect(fill = "white"),
        combmatrix.panel.point.size = 4,
        combmatrix.panel.point.color.fill = "#a70042",
        combmatrix.panel.line.color = "#a70042",
        panel.grid = element_line(color = "black"),
        panel.grid.major.x = element_blank(),
        axis.ticks = element_blank())))

## Plot 2011/2012
messi_nested_all_shot_assist$plot[[8]] +
  scale_y_continuous(labels = seq(0, 12, by = 2),
                     breaks = seq(0, 12, by = 2),
                     limits = c(0, 12))

It might be a good idea to combine these plots with other visualizations
such as a pass frequency table and/or a pass network map. For looking at
completed passes into the box you could create some pass maps
highlighting Zone 14 or the Half-Spaces like the ones Between the
Posts
create for their match
reports. For the shot assists plot above we might also want to put it
side-by-side with an xG plot to show whose passes created high xG value
chances.

I only used upset plots for two different elements (the passer and the
pass receiver) but the advantages of this visualization method becomes
more pronounced with even more set intersections so there remains more
room for applying these to other types of soccer viz. For example you
could extend this to look at the most frequent passing sequences between
3 players or 4 or even 5. From the first example one of the top passing
sequences between 3 players might be something like Victor Valdes –
Busquets – Xavi/Iniesta. The matrix underneath the plot may become a bit
unwieldy without some filtering and tweaking by setting different values
for n_intersections, n_sets, and others in scale_x_upset().

Conclusion

In this blog post I went over some simple plots (dot, line, and bar
charts) you can do using {ggplot2} with the free StatsBomb data. There’s
plenty more to do with this data and I’m still experimenting and
learning everyday. A good way to practice is to take something someone
has done and then recreate that visualization by applying it to a
slightly different data set with your favorite programming language.
This is exactly how I learned to do things; I’ll see something on
Twitter and try to remake that viz using Liverpool data or J-League data
instead!

Some people you may want to follow for inspiration:

If I can refine/improve upon any of the above then I’ll show my own
version of these in a future part. Anything new whether its a standalone viz or a full blog post will be linked with the code here on my soccer_ggplot GitHub
repository!

If you want to iterate this process over many players/teams/seasons
there are ways to do so using the purrr::map() family of functions or
for loops along with the nest() approach I used for the pass partner
plots. You might also be interested in creating automated parameterized
reports with RMarkdown using this data, some resources include “The
lazy and easily distracted report writer” by Mike Smith at
RStudioConf::2019
and
Chapter 15: Parameterized reports of YiHui Xi’s RMarkdown: The
Definitive
Guide
.
You may also want to have a unified theme to all your plots, I talk
about creating your own {ggplot2} themes
here
and there’s other great resources like
this that you might want to read.

Part 2 will be more xG plots and also on plotting out the data on soccer
pitches using packages like
{ggsoccer},
{SBpitch},
{soccermatics}, and more!

To leave a comment for the author, please follow the link and comment on their blog: R by R(yo).

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)