Superbowl XLIX: Worst Call Ever?

February 5, 2015
By

(This article was first published on NYC Data Science Academy » R, and kindly contributed to R-bloggers)

  1. Contributed by James Hedges and Malcolm Hess.
  2. James and Malcolm are part of the 12-Week Data Science Bootcamp with Vivian Zhang in the spring of 2015.
  3. This post is based on their first in-class presentation, a review of Benjamin Morris’ article on FiveThirtyEight.com related to Seattle’s final offensive play in Super Bowl XLIX.

 


Videos

1. Video of the presentation can be found here:

https://www.youtube.com/watch?v=ZZl8r_K7I5E&feature=youtu.be


 Background
We’re interested in applying statistical and analytical approaches to competitive sports, and to gain surprising insights from doing so. To that end, we discussed a recent article by FiveThirtyEight.com’s Benjamin Morris in which he builds support for the contrarian position the decision underlying what may be remembered as one of the most impactful plays in Super Bowl history. He develops a probabilistic model in support of the conclusion that Seattle’s decision to throw the ball on second down from the 1-yard line wasn’t actually bad decision.
We wanted to learn more about his model and to see whether we could implement a version of it ourselves. We also wanted provide some context for it and to consider other approaches to problems of this kind. Doing well with such problems may hinge on understanding and simplifying the dependencies between a context (e.g., 2 down, 1-yard line, down by 4 points), a decision (e.g., run the ball or throw the ball), a specific outcome (e.g., a touchdown or an interception), and a more general outcome (e.g., win or lose the game).

 


 

 

Objectives

We initially attempted to recreate a primary result from the article, in which the probabilities of sequential play outcomes and overall game outcomes are computed, and in which those estimates change based on some additional assumptions. While the objective of recreating the model is important objective, we felt it was unrealistic to reach that point without having more information on the data Morris’ used and in how the model was actually computed. Our attention instead turned to numerically replicating some elements of the model, such as the probability of scoring a touchdown on a run play, and to evaluating whether tweaks to the model were reasonable.


 

Situation

  1. We start with the play. This image from NFL Breakdowns shows Seattle in a shotgun formation at New England’s 1-yard line in the seconds just prior to the snap. Trailing by four points, a touchdown would have put Seattle up by three (assuming they go for and get the PAT), something many an observer would have as much assumed was going to happen. Russell Wilson’s attempted pass on a slant route to Ricardo Lockette.”

 

fGtafoB

football in 5 rules
  1. points: Touchdown = 6 pts; Field Goal = 3 puts; Point After Touchdown (easy kick) = 1 pt
  2. attacking team (offense) scores a touchdown by getting the ball into the end zone (area beyond goal line)
  3. offense has four attempts to move the ball 10 yds; if inside 10 yard line, then just the number of attempts to goal
  4. ball is advanced by throwing the ball to someone who catches it or by someone running with the ball (i.e., pass or run the ball)
  5. a given play ends with the person with the ball is tackled or goes out of bounds or when its passed and not caught
source: http://usafootball.com/football-basics

In a simpler view, imagine having two bowls each with three colored balls.  You pull a ball out blindfolded one at a time.  Pull a red ball you win, a black ball you lose, and a yellow ball lets you pull again.  However if you pull three yellow balls in a row you also lose.  There are two bowls to choose from, one called run and one called pass, each has a different amount amount of red, yellow, and black balls.

Using this mentality we created a probability tree that includes all possibilities from this decision.

prob

 


 

Implementation

Data for play by play results of every NFL game of the 2014 season was found here: source: http://nflsavant.com/about.php
# get data ----------------------------------------------------------------
library(downloader)
fileUrl<-http://nflsavant.com/pbp_data.php?year=2014"
downloadfileUrl, dest="./data/data.pbp.2014.csv",mode="wb")
list.files("./data")
data.pbp.2014 <- read.csv(“./data/data.pbp.2014.csv”)
# check data --------------------------------------------------------------
str(data.pbp.2014)


# 45k observations by 45 vars


# 01 - GameId - integer - example: 2014090400 - date of game and two more digits

# 02 - GameDate - factor - example: 2014-09-04 - date of game

# 03 - Quarter - integer - example: 1 - quarter in game

# 04 - Minute - integer - example: 15 - minutes left in quarter

# 05 - Second - integer - example: 0 - seconds left in quarter

# 06 - OffenseTeam - factor - example: ARI - offensive team

# 07 - DefenseTeam - factor - example: ARI - offensive team

# 08 - Down - integer - example: 1 - down; not sure ab 0?

# 09 - ToGo - integer - example: 10 - distance to go; not sure ab 0?

# 10 - YardLine - integer - example: 35 - distance to go; not sure ab 0? *******

# 11 - X - logical - example: ?? - not sure

# 12 - SeriesFirstDown - integer - example: 1 - series 1st down

# 13 - X.1 - logical - example: ?? - not sure

# 14 - NextScore - integer - example: 0 - check this ***************************

# 15 - Description - factor - example: "D.CARR.." - description

# 16 - TeamWin - integer - example: 0 - unclear - ******************************

# 17 - X.2 - logical - ?? - not sure

# 18 - X.3 - logical - ?? - not sure

# 19 - SeasonYear - integer - example: 2014 - season year

# 20 - Yards - integer - example: 0 - yards from result of play? ***************

# 21 - Formation - factor - example: SHOTGUN - simple formation on play

# 22 - PlayType

# 23 - IsRush - integer - example: 0 - whether rush play or not ****************

# 24 - IsPass - integer - example: 0 - whether pass play or not ****************

# 25 - IsIncomplete

# 26 - IsTouchdown - integer - example: 0 - whether play was touchdown or not***

# 27 - PassType

# 28 - IsSack

# 29 - IsChallenge

# 30 - IsChallengeReversed

# 31 - Challenger

# 32 - IsMeasurement

# 33 - IsInterception - integer - example: 0 - whether play was interception ***

# 34 - IsFumble - integer - example: example: 0 - whether play was fumble ******

# 35 - IsPenalty - integer - example: example: 0 - whether play was penalty ****

# 36 - IsTwoPointConversion

# 37 - IsTwoPointConversionSuccessful

# 38 - RushDirection

# 39 - YardLineFixed - integer - example: 35 - 0-50 yardline

# 40 - YardLineDirection - factor - example: OPP - which side of field

# 41 - IsPenaltyAccepted - integer - example: 0 - penalty accepted or not ******

# 42 - PenaltyTeam - factor - example: ARI - why 33 levels

# 43 - IsNoPlay - integer - example: 0 - not sure what this means

# 44 - PenaltyType - factor - example: BLOCKED INTO PUNTER

# 45 - PenaltyYards - integer - example: 5 - yards from penalty

Then we sum amount of events that met all the criteria.  For each, pass and run, we needed the total amount of attempts, the amount of touchdowns (successes), and amount of turnovers (either fumble or interception).

# probability of outcomes
-------------------------------------------------

n.rush <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &

data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsRush == 1,])

n.rush.td <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1
&
data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsRush == 1 &

data.pbp.2014$IsTouchdown == 1,])

n.rush.no.td <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsRush == 1 &
data.pbp.2014$IsTouchdown == 0,])

n.rush.fumble <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsRush == 1 &
data.pbp.2014$IsFumble == 1,])

round(n.rush.td / n.rush, digits=3)
round(n.rush.no.td / n.rush, digits=3)
round(n.rush.fumble / n.rush, digits=4)

n.pass <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsPass == 1,])

n.pass.td <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsPass == 1 &
data.pbp.2014$IsTouchdown == 1,])

n.pass.no.td <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsPass == 1 &
data.pbp.2014$IsTouchdown == 0,])

n.pass.interception <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsPass == 1 &
data.pbp.2014$IsInterception == 1,])

 

Lastly we calculate the success and failure chances by dividing those by the total amount of attempts.

round(n.pass.td / n.pass, digits=3)
round(n.pass.no.td / n.pass, digits=3)
round(n.pass.interception / n.pass, digits=4)

# > round(n.rush.td / n.rush, digits=3)
# [1] 0.563
# > round(n.rush.no.td / n.rush, digits=3)
# [1] 0.437
# > round(n.rush.fumble / n.rush, digits=4)
# [1] 0.0101

# > round(n.pass.td / n.pass, digits=3)
# [1] 0.579
# > round(n.pass.no.td / n.pass, digits=3)
# [1] 0.421
# > round(n.pass.interception / n.pass, digits=4)
># [1] 0

 

This success rate will is used to determine if the decision made in the Superbowl was good or not.  Since there are not enough sample size in the 2014 season, we felt it was unwise to use an individual team’s success rate given that there is not a big enough sample with exact parameters of the play (ball on 1 yard line).  
chart

Conclusion 
We can recreate a victory probability model using these numbers.  Doing so shows us that passing is in fact more likely to succeed than running the ball.  Unfortunately we cannot compare our model to that found on the 538 article because there are many built in assumptions including a significant change in success rate which is dependent on if the first play was either a run or a pass.

 

 

To leave a comment for the author, please follow the link and comment on their blog: NYC Data Science Academy » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)