Using R/fitzRoy to ask: how many times a V/AFL team with the same lineup has played together?

[This article was first published on R – What You're Doing Is Rather Desperate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you sit in the intersection of “likes Australian Rules football / finds sport statistics interesting / is on Twitter”, you’ve probably come across Swamp. One of his recent tweets tells us that:

You may go on to ask: has any team lineup from one of the almost 16 000 recorded games played together again in another game? And if so, how often?

The answer to that question is at once surprising, less surprising when you think about it, and quite easy to figure out using the ever-helpful fitzRoy package.

Getting the data

Several options would work: I used the fitzRoy function get_fryzigg_stats() which although deprecated, does what I want (gets all games from 1897 onwards in one shot), and returns nicer variable names than some of the other functions.

library(tidyverse)
library(fitzRoy)
library(lubridate)

# get data
afldata <- fitzRoy::get_fryzigg_stats()

Most games played by the same lineup

Players in a team are identified by a numerical player_id. So we can represent the team lineup (here called squad) by sorting the IDs and pasting them into a character string. Then, simply counting the strings and filtering for n > 1 will return teams where the same set of players played together more than once. Note that we need to sort the numerical IDs, or we could get the same players in more than one game, but in different orders.

lineup_multiple_games <- afldata %>% 
  group_by(match_id, player_team) %>% 
  summarise(squad = paste(sort(player_id), collapse = ";")) %>% 
  ungroup() %>% 
  count(player_team, squad, sort = TRUE, name = "n_games") %>% 
  filter(n_games > 1) %>% 
  mutate(n_players = str_count(squad, ";") + 1)

We get to the final dataset by doing something quite similar except this time, grouping on more variables before generating the lineup. We can then join to the count data in step 1, creating a dataset that looks like this.

Rows: 2,408
Columns: 8
$ match_id    <int> 9, 12, 13, 14, 29, 33, 60, 61, 78, 81, 88, 90, 92, 109, 111, 116, 124, 129, 131, 131, 132, 150, 156, 186, 190, 191, 194, 207, 213, 218, 230, 252, 256, 260, 270,…
$ match_date  <chr> "1897-05-22", "1897-05-24", "1897-05-29", "1897-05-29", "1897-06-26", "1897-07-03", "1897-08-28", "1897-09-04", "1898-05-28", "1898-06-04", "1898-06-18", "1898-…
$ match_round <chr> "3", "3", "4", "4", "8", "9", "Semi Final", "Semi Final", "4", "5", "7", "7", "8", "12", "13", "14", "16", "17", "Semi Final", "Semi Final", "Grand Final", "5",…
$ venue_name  <chr> "Victoria Park", "Lake Oval", "Corio Oval", "Lake Oval", "Ikon Park", "Ikon Park", "Brunswick St", "East Melbourne", "Brunswick St", "Corio Oval", "Victoria Par…
$ player_team <chr> "Geelong", "Sydney", "Geelong", "Sydney", "Carlton", "Carlton", "Geelong", "Geelong", "Fitzroy", "Fitzroy", "Collingwood", "Geelong", "Geelong", "Fitzroy", "Fit…
$ squad       <chr> "81;82;83;84;87;88;90;91;92;94;96;97;98;99;100;184;187;188;189;190", "121;123;124;126;127;128;129;130;131;132;133;134;135;137;138;139;140;161;163;164", "81;82;8…
$ n_games     <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 4, 4, 3, 4, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3…
$ n_players   <dbl> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, …

So there are to date 2 408 games that have featured the same lineup for a team at least twice.

The most games with the same lineup is?

games_same_lineup %>% 
  filter(n_games == max(n_games)) %>% 
  select(-squad)
match_idmatch_datematch_roundvenue_nameplayer_teamn_gamesn_players
20381924-05-245MCGSydney718
20391924-05-316Lake OvalSydney718
20431924-06-077Ikon ParkSydney718
20541924-06-219Lake OvalSydney718
20651924-07-1212Lake OvalSydney718
20791924-08-2316Lake OvalSydney718
20921924-09-13Semi FinalWindy HillSydney718

Sydney (South Melbourne in those days), 7 games back in 1924.

Most games played by a 22-player lineup

Games today feature 22 players per side (more recently 23, with a medical substitute). The most games with the same 22-player lineup is?

games_same_lineup %>% 
  filter(n_players == 22) %>% 
  filter(n_games == max(n_games)) %>% 
  select(-squad)
match_idmatch_datematch_roundvenue_nameplayer_teamn_gamesn_players
128002005-08-0619Marvel StadiumSydney522
128102005-08-1420ANZ StadiumSydney522
128182005-08-2121SCGSydney522
128222005-08-2722MCGSydney522
128292005-09-02Qualifying FinalSubiacoSydney522
149012016-06-2314Adelaide OvalAdelaide522
149122016-07-0315MCGAdelaide522
149722016-08-2022Adelaide OvalAdelaide522
149882016-09-10Elimination FinalAdelaide OvalAdelaide522
149902016-09-17Semi FinalSCGAdelaide522
155562019-07-2018GabbaBrisbane Lions522
155782019-08-0420GabbaBrisbane Lions522
155822019-08-1021GabbaBrisbane Lions522
155902019-08-1722GabbaBrisbane Lions522
156092019-09-07Qualifying FinalGabbaBrisbane Lions522

5 games, which has happened 3 times in 2005 (Sydney), 2016 (Adelaide) and 2019 (Brisbane).

Games played across seasons by the same lineup

Has a lineup from one season taken to the field again the following season?

games_same_lineup %>% 
  group_by(squad) %>% 
  filter(n_distinct(year(match_date)) > 1) %>% 
  ungroup() %>% 
  select(-squad)
match_idmatch_datematch_roundvenue_nameplayer_teamn_gamesn_players
127312005-05-2910Marvel StadiumWestern Bulldogs322
128392006-03-311Marvel StadiumWestern Bulldogs322
128472006-04-082Marvel StadiumWestern Bulldogs322

Just once: the round 10 Western Bulldogs from 2005 appeared again in rounds 1 and 2, 2006. Good to see that the result of this code agrees with a tweet from another AFL stats enthusiast.

Games played with the same lineup for both teams

Have there ever been two or more games where both sides fielded the same lineup?

I’m still working on the logic to answer this one, but I think that:

  • If both sides feature a lineup that played more than once (not necessarily against one another), then the match ID should appear twice in the n > 1 dataset
  • If those sides and lineups played each other two or more times, then an ordered string made from all 44 player IDs should be counted twice or more

Trying to express that using dplyr:

# function to order and join players from both teams
join_teams <- function(first_team, last_team) {

  teams <- list(first_team, last_team)

  teams <- lapply(teams, function(x) {
    x %>% 
    str_split(";") %>% 
    .[[1]] %>% 
    as.numeric() %>% 
    sort()
  }
  )
  
  players <- teams %>% 
    unlist() %>% 
    sort() %>% 
    paste(collapse = ";")
  
  players
}


games_same_lineup %>% 
  group_by(match_id) %>% 
  filter(n() > 1) %>% 
  summarise(players = join_teams(first(squad), last(squad))) %>% 
  ungroup() %>% 
  count(players, name = "n_games") %>% 
  count(n_games)

Result:

n_gamesn
199

So if I got that right, there are 99 games between teams where each team has featured the same lineup more than once – but never against each other. That is, the same two opposing lineups have never played each other more than once.

Surprising?

In one sense, yes. Seven games with the same team lineup from almost 16 000 seems like a low number. Supporters on my team forum often ask “you mean in a row?” when I mention this number and they seem surprised when I say “no, ever”.

When you think about it more, it’s less surprising. AFL games are pretty physical and most games feature at least one injury that prevents a player playing for at least the next game. Players are dropped, other players return from injury, players are omitted or included because they offer a better match-up depending on the opponent. So really, we shouldn’t expect the same team lineup for week after week.

The fitzRoy package is great not just for AFL analytics, but for answering fun trivia questions like this one. Thanks to the author, James Day. Thanks also to Tony Corke for Twitter discussions on this topic. All code is available as RMarkdown at Github.

To leave a comment for the author, please follow the link and comment on their blog: R – What You're Doing Is Rather Desperate.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)