Make your own PAV

[This article was first published on Analysis of AFL, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The idea behind the description recreate to create is that to make your own rating system be it a player one or a team one, a good first step is to recreate what you see and then add in your own opinion to create your own system.

The guys over at HPN have their own player rating system called PAV which stands for Player Approximate Value.

You can explore their PAV ratings for both Men and Womens.

Why would you want to create your own system?

You might have a different opinion in terms of how the formula is derived

The weightings and multipliers used in each component formula will necessarily look a bit arbitrary, but are the results of adjustment and tweaking until the results lined up with other methods of ranking and evaluating players as described above.

That is not to say how it was done is wrong, but maybe you have another method of ranking and evaluating players that you would like your system to align with.

You might just want to use different variables?

As the collection of several of these measures only commenced in 1998, we have also adapted another formula for the pre-1998 seasons which correlates extremely strongly with the newer formula. Whilst we feel it is less accurate than the newer formula, it still largely conforms to the findings of the newer formula. This formula was created by trying to minimise the standard deviation for each player’s PAV across the last five seasons of AFL football. Around 5% of players have a difference in value of more than one PAV between the new and old formulas.

Lets say you are working in clubland, you might like the ideas used, but have your own internal metrics you are collecting and would like to use instead. Hopefully as a fan of the game you are noticing that more statistics are being made available and accessible through fitzRoy. For example fitzRoy allows users to access both afltables and footywire with footywire containing some extra variables that you might want to include in your rating system such as intercepts and tackles inside 50 to name a couple.

OK so how do we go about recreating?

Well thankfully the guys over at hpn have written about the formula they used.

Step One

The first thing we do is get our datasets. Now we have access through fitzRoy to both data from afltables and footywire and one of the reasons you might be doing this is because you want to use the extra data in one of them for your ratings.

Now its not only just the data that is available through fitzRoy that you can use. At the time of writing this there are a few extra variables you might want to integrate in such as player position and maybe age that haven’t been integrated into fitzRoy but hopefully they will be soonish.

library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.3.1     ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
afltables<-fitzRoy::get_afltables_stats(start_date="1990-01-01", end_date="2018-10-10")
## Returning data from 1990-01-01 to 2018-10-10
## Downloading data
## 
## Finished downloading data. Processing XMLs
## Finished getting afltables data
footywire<-fitzRoy::player_stats

Something to note about the two datasets is that to join them on together we need some sort of joining ID. The easiest ones are usually done via a key of team name, season, player or soemthing similar. Unfortunetely the teams aren’t named the same through the datasets. For example in the footywire dataset the Greater Western Sydney Giants are called GWS, while in the afltables dataset they are called Greater Western Sydney.

So lets make sure the team names align between datasets so we can join them on later

#####step 1 get team names matching to join on scores to player data
afltables<- mutate_if(tibble::as_tibble(afltables),
                      is.character,
                      str_replace_all, pattern = "Greater Western Sydney", replacement = "GWS")
afltables <- mutate_if(tibble::as_tibble(afltables),
                       is.character,
                       str_replace_all, pattern = "Brisbane Lions", replacement = "Brisbane")
# names(afltables)

Now because we are recreating the blog post we should just focus on some values that we know so we can check to see if we have things covered. So lets filter our data.

afltables<-filter(afltables, Season>2010)
afltables<-filter(afltables, Season<2017)

Step Two recreate PAV per blogpost

afltables_home<-filter(afltables, Playing.for==Home.team)
afltables_away<-filter(afltables,Playing.for==Away.team)


afltables_home$pavO<-afltables_home$Home.score +
  0.25*afltables_home$Hit.Outs +
  3*afltables_home$Goal.Assists+
  afltables_home$Inside.50s+
  afltables_home$Marks.Inside.50+
  (afltables_home$Frees.For-afltables_home$Frees.Against)

afltables_home$pavD<-20*afltables_home$Rebounds +
  12*afltables_home$One.Percenters+
  (afltables_home$Marks-4*afltables_home$Marks.Inside.50+2*(afltables_home$Frees.For-afltables_home$Frees.Against))-
  2/3*afltables_home$Hit.Outs

afltables_home$pavM<-15*afltables_home$Inside.50s+
  20*afltables_home$Clearances +
  3*afltables_home$Tackles+
  1.5*afltables_home$Hit.Outs +
  (afltables_home$Frees.For-afltables_home$Frees.Against)



afltables_away$pavO<-afltables_away$Away.score +
  0.25*afltables_away$Hit.Outs +
  3*afltables_away$Goal.Assists+
  afltables_away$Inside.50s+
  afltables_away$Marks.Inside.50+
  (afltables_away$Frees.For-afltables_away$Frees.Against)


afltables_away$pavD<-20*afltables_away$Rebounds +
  12*afltables_away$One.Percenters+
  (afltables_away$Marks-4*afltables_away$Marks.Inside.50+2*(afltables_away$Frees.For-afltables_away$Frees.Against))-
  2/3*afltables_away$Hit.Outs



afltables_away$pavM<-15*afltables_away$Inside.50s+
  20*afltables_away$Clearances +
  3*afltables_away$Tackles+
  1.5*afltables_away$Hit.Outs +
  (afltables_away$Frees.For-afltables_away$Frees.Against)

fulltable<-rbind(afltables_home,afltables_away)
names(fulltable)
##  [1] "Season"                  "Round"                  
##  [3] "Date"                    "Local.start.time"       
##  [5] "Venue"                   "Attendance"             
##  [7] "Home.team"               "HQ1G"                   
##  [9] "HQ1B"                    "HQ2G"                   
## [11] "HQ2B"                    "HQ3G"                   
## [13] "HQ3B"                    "HQ4G"                   
## [15] "HQ4B"                    "Home.score"             
## [17] "Away.team"               "AQ1G"                   
## [19] "AQ1B"                    "AQ2G"                   
## [21] "AQ2B"                    "AQ3G"                   
## [23] "AQ3B"                    "AQ4G"                   
## [25] "AQ4B"                    "Away.score"             
## [27] "First.name"              "Surname"                
## [29] "ID"                      "Jumper.No."             
## [31] "Playing.for"             "Kicks"                  
## [33] "Marks"                   "Handballs"              
## [35] "Goals"                   "Behinds"                
## [37] "Hit.Outs"                "Tackles"                
## [39] "Rebounds"                "Inside.50s"             
## [41] "Clearances"              "Clangers"               
## [43] "Frees.For"               "Frees.Against"          
## [45] "Brownlow.Votes"          "Contested.Possessions"  
## [47] "Uncontested.Possessions" "Contested.Marks"        
## [49] "Marks.Inside.50"         "One.Percenters"         
## [51] "Bounces"                 "Goal.Assists"           
## [53] "Time.on.Ground.."        "Substitute"             
## [55] "Umpire.1"                "Umpire.2"               
## [57] "Umpire.3"                "Umpire.4"               
## [59] "group_id"                "pavO"                   
## [61] "pavD"                    "pavM"
fulltable2016<-filter(fulltable, Season==2016)

Step 3 - Check a players values

Now we have the PAV ratings for 2016, lets check a players PAV to see if we have done it right (note you should probably check multiple players but its late)

The player I am going to check is Bryce Gibbs and I am going to check to see if his midfield PAV matches the blog post

### check get same value for bryce gibbs  ###matches blog post http://www.hpnfooty.com/?p=21810

fulltable2016%>%group_by(First.name, Surname)%>% summarise(total_mid_pav=sum(pavM))%>%
  filter(Surname=="Gibbs", First.name=="Bryce")
## # A tibble: 1 x 3
## # Groups:   First.name [1]
##   First.name Surname total_mid_pav
##   <chr>      <chr>           <dbl>
## 1 Bryce      Gibbs            3984
fulltable2016%>%group_by(Playing.for)%>% summarise(team_mid_pav=sum(pavM))
## # A tibble: 18 x 2
##    Playing.for      team_mid_pav
##    <chr>                   <dbl>
##  1 Adelaide               45679 
##  2 Brisbane               36986.
##  3 Carlton                37702.
##  4 Collingwood            38445 
##  5 Essendon               35483 
##  6 Fremantle              37135 
##  7 Geelong                45027 
##  8 Gold Coast             36114.
##  9 GWS                    46991 
## 10 Hawthorn               43548 
## 11 Melbourne              40918.
## 12 North Melbourne        40982.
## 13 Port Adelaide          41328.
## 14 Richmond               35464.
## 15 St Kilda               38218.
## 16 Sydney                 49909 
## 17 West Coast             41712.
## 18 Western Bulldogs       47766
100*(3984/37702)
## [1] 10.56708

Hazzaaa it matches. Can someone check the rest tell me where I went wrong and flick me an email please.

As always this is a work in progress so this post will probably get an update.

To leave a comment for the author, please follow the link and comment on their blog: Analysis of AFL.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)