Data Viz The Glossy

May 20, 2018
By

(This article was first published on Analysis of AFL, and kindly contributed to R-bloggers)

One of the fun things when doing graphs, is that moment when you identify something that sticks out. A great example of this is done by ethan_meldrum

In it we can see how much Paul Seedsman 2018 has stood out so far. Lets give it the ggplot/fitzRoy treatment.

Step One – Get the Data

library(fitzRoy)
library(tidyverse)

df<-fitzRoy::player_stats

df1<-fitzRoy::get_footywire_stats(9514:9593)
df<-df%>%filter(Season != 2018)
df2<-rbind(df, df1)

df2%>%select(Season,TO, MG, Player) %>%
  filter(Season==2018) %>%
  group_by(Player)%>%
  summarise(Turnovers=sum(TO), Meters_gained=sum(MG)) 
  • df<-fitzRoy::player_stats this gets footywire data from 2010-2018 some round I say some round because this requires a weekly update from James or myself which sadly doesn’t always happen.
  • df1<-fitzRoy::get_footywire_stats(9514:9593) this gets the footywire data for 2018, 9514 is the first game of 2018 and 9593 was the most recent game as of writing this post (23/05/2018)
  • We want to stack these dataframes on top of each other but we are away that some of the df might contain 2018 data so we just get rid of it using filter(Season!=2018)
  • df2<-rbind(df, df1) stacks our datasets together and calls it df2

  • From df2 we then select the columns we want Season, TO (Turnovers), MG (Meters gained) and Player
  • Then we filter out just this years data filter(Season==2018)
  • then we group_by player note if we wanted season on season comparisions we would group_by (Season, Player) and not use the filter(Season==2018)
  • We then summarise the data, get the Season total for each players Turnovers and meters gained. summarise(Turnovers=sum(TO), Meters_gained=sum(MG))

Step 2

This is an assumption we are making here that we know the players we want to highlight. But lets say we do and the players are Rory Laird, Bryce Gibbs, Paul Seedsman and Tom Mitchell. We need to create a flag for them so we can subset them out for highlighting down the track.

We do this using the mutate.

mutate(flag = ifelse(Player %in% c("Rory Laird","Bryce Gibbs",Paul Seedsman","Tom Mitchell"), T, F))

Lets break this down

  • step a create a variable called flag flag=
  • step b ifelse(Player %in% c("P1", "P2")) if Player is in this list (c(“P1”, P2))
  • step c , Give value T otherwise give value F ,T,F))
df2%>%select(Season,TO, MG, Player) %>%
  filter(Season==2018) %>%
  group_by(Player)%>%
  summarise(Turnovers=sum(TO), Meters_gained=sum(MG)) %>%
  mutate(flag = ifelse(Player %in% c("Rory Laird",
                                               "Bryce Gibbs",
                                               "Paul Seedsman", 
                                               "Tom Mitchell"), T, F)) 

Step 3 – Give it the ggplot treatment

 ggplot(aes(x=Meters_gained, y=Turnovers)) + 
  geom_point(aes(colour=flag), show.legend = FALSE) +
  geom_smooth(method='lm',formula=y~x) +
  geom_text(aes(label=ifelse(Player %in% c ("Rory Laird", 
                                            "Bryce Gibbs",
                                            "Paul Seedsman", 
                                            "Tom Mitchell"), 
                             Player, ""),vjust=-1))
  • 1 ggplot(aes(x=Meters_gained, y=Turnovers)) creates a blank canvas with our x variable being Meters_gained and y being Turnovers
  • 2 geom_point(aes(colour=flag), show.legend=FALSE) this creates our graph dots we are taking the xy from ggplot and using aes(colour=flag) to give our points different colours depending on the value of flag
  • 3 geom_smooth(method=lm, formula=y~x) this gives us our line of best fit (linear regression method=lm)
  • 4 geom_text this gives us our labels, but we don’t want all the players so we subset the labels. We do this using another ifelse
  • 5 you can think of the ifelse as if the Player is in this list , give them label of Player, otherwise blank “”

  • 5a where Player is the list of all Players, checking if any player is in this list c ("Rory Laird","Bryce Gibbs", "Paul Seedsman", "Tom Mitchell") and if they are give them their original label Player otherwise a blank label " "
  • 5b vjust moves the labels so not directly over the dot

Put it all together

library(fitzRoy)
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.2     v dplyr   0.7.4
## v tidyr   0.8.0     v stringr 1.3.0
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts ------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
df<-fitzRoy::player_stats

df1<-fitzRoy::get_footywire_stats(9514:9593)
## Getting data from footywire.com
## Finished getting data
df<-df%>%filter(Season != 2018)
df2<-rbind(df, df1)

df2%>%select(Season,TO, MG, Player) %>%
  filter(Season==2018) %>%
  group_by(Player)%>%
  summarise(Turnovers=sum(TO), Meters_gained=sum(MG)) %>%
  mutate(flag = ifelse(Player %in% c("Rory Laird",
                                     "Bryce Gibbs",
                                     "Paul Seedsman", 
                                   "Tom Mitchell"), T, F))%>%
  ggplot(aes(x=Meters_gained, y=Turnovers)) + 
  geom_point(aes(colour=flag), show.legend = FALSE) +
  geom_smooth(method='lm',formula=y~x) +
  geom_text(aes(label=ifelse(Player %in% c ("Rory Laird", 
                                            "Bryce Gibbs",
                                            "Paul Seedsman", 
                                            "Tom Mitchell"), 
                             Player, ""),vjust=-1))

To leave a comment for the author, please follow the link and comment on their blog: Analysis of AFL.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)