Couting Pairs

April 25, 2019
By

(This article was first published on Analysis of AFL, and kindly contributed to R-bloggers)

Saw a tweet from [@matt_pavlich](https://twitter.com/matt_pavlich) asking twitter roughly how many games he and David Mundy have played together.

Thankfully, you don’t have to wonder anymore and you can reproduce the results yourself and do running counts for your favourite players!

library(tidyverse)
## ── Attaching packages ─────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.1       ✔ purrr   0.3.2  
## ✔ tibble  2.1.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.3       ✔ stringr 1.4.0  
## ✔ readr   1.3.1       ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(fitzRoy)
df<-fitzRoy::get_afltables_stats(start_date = "1990-01-01", end_date=Sys.Date())
## Returning data from 1990-01-01 to 2019-04-26
## Downloading data
## 
## Finished downloading data. Processing XMLs
## Warning: Detecting old grouped_df format, replacing `vars` attribute by
## `groups`
## Finished getting afltables data
df$matchID<-paste(df$Season, df$Round, df$Home.team, df$Away.team)

df$name<-paste(df$First.name, df$Surname)
df_data<-df%>%filter(Playing.for=="Fremantle")%>%select(name, matchID, Playing.for)
df_data %>%
mutate(n = 1) %>%
spread(name, n, fill=0) %>%
select(-Playing.for, -matchID) %>%
{crossprod(as.matrix(.))} %>%
replace(lower.tri(., diag=T), NA) %>%
reshape2::melt(na.rm=T) %>%
unite('Pair', c('Var1', 'Var2'), sep=", ")%>%
  filter(value>150)%>%
  arrange(desc(value))
##                                 Pair value
## 1  Aaron Sandilands, Matthew Pavlich   232
## 2       David Mundy, Matthew Pavlich   226
## 3    Luke McPharlin, Matthew Pavlich   224
## 4       David Mundy, Michael Johnson   222
## 5      Aaron Sandilands, David Mundy   214
## 6      Matthew Pavlich, Paul Hasleby   200
## 7     Antoni Grover, Matthew Pavlich   191
## 8   Matthew Pavlich, Michael Johnson   187
## 9        David Mundy, Luke McPharlin   186
## 10  Aaron Sandilands, Luke McPharlin   183
## 11         David Mundy, Stephen Hill   183
## 12         David Mundy, Ryan Crowley   175
## 13 Aaron Sandilands, Michael Johnson   173
## 14   Luke McPharlin, Michael Johnson   171
## 15       Shane Parker, Shaun McManus   171
## 16     Matthew Pavlich, Ryan Crowley   167
## 17    Matthew Pavlich, Shaun McManus   164
## 18     Michael Johnson, Ryan Crowley   163
## 19       Matthew Pavlich, Peter Bell   160
## 20     David Mundy, Garrick Ibbotson   158
## 21          Chris Mayne, David Mundy   154
## 22    David Mundy, Hayden Ballantyne   154
## 23        David Mundy, Paul Duffield   153
## 24       Antoni Grover, Paul Hasleby   153
## 25     Matthew Pavlich, Stephen Hill   151

Some interesting things you might want to do now you have the script and data, you might want to see which pair has played the most together for each team.

Something you might also want to do is look the games that they played together in.

df_data%>%
  filter(name %in% c("David Mundy", "Matthew Pavlich")) %>%
  group_by(matchID)%>%
  count(n=n())%>%
  filter(n==2)
## # A tibble: 226 x 3
## # Groups:   matchID [226]
##    matchID                                n    nn
##                                   
##  1 2005 10 Geelong Fremantle              2     2
##  2 2005 11 Fremantle Brisbane Lions       2     2
##  3 2005 12 Sydney Fremantle               2     2
##  4 2005 13 Fremantle North Melbourne      2     2
##  5 2005 14 Adelaide Fremantle             2     2
##  6 2005 15 Fremantle Western Bulldogs     2     2
##  7 2005 16 Carlton Fremantle              2     2
##  8 2005 17 Fremantle Melbourne            2     2
##  9 2005 18 Collingwood Fremantle          2     2
## 10 2005 19 Fremantle Richmond             2     2
## # … with 216 more rows

To leave a comment for the author, please follow the link and comment on their blog: Analysis of AFL.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)