**NYC Data Science Academy » R**, and kindly contributed to R-bloggers)

- Contributed by James Hedges and Malcolm Hess.
- James and Malcolm are part of the 12-Week Data Science Bootcamp with Vivian Zhang in the spring of 2015.
- This post is based on their first in-class presentation, a review of Benjamin Morris’ article on FiveThirtyEight.com related to Seattle’s final offensive play in Super Bowl XLIX.

**Videos**

1. Video of the presentation can be found here:

https://www.youtube.com/watch?v=ZZl8r_K7I5E&feature=youtu.be

**Background**

**Objectives**

We initially attempted to recreate a primary result from the article, in which the probabilities of sequential play outcomes and overall game outcomes are computed, and in which those estimates change based on some additional assumptions. While the objective of recreating the model is important objective, we felt it was unrealistic to reach that point without having more information on the data Morris’ used and in how the model was actually computed. Our attention instead turned to numerically replicating some elements of the model, such as the probability of scoring a touchdown on a run play, and to evaluating whether tweaks to the model were reasonable.

**Situation**

- We start with the play. This image from NFL Breakdowns shows Seattle in a shotgun formation at New England’s 1-yard line in the seconds just prior to the snap. Trailing by four points, a touchdown would have put Seattle up by three (assuming they go for and get the PAT), something many an observer would have as much assumed was going to happen. Russell Wilson’s attempted pass on a slant route to Ricardo Lockette.”

**football in 5 rules**

- points: Touchdown = 6 pts; Field Goal = 3 puts; Point After Touchdown (easy kick) = 1 pt
- attacking team (offense) scores a touchdown by getting the ball into the end zone (area beyond goal line)
- offense has four attempts to move the ball 10 yds; if inside 10 yard line, then just the number of attempts to goal
- ball is advanced by throwing the ball to someone who catches it or by someone running with the ball (i.e., pass or run the ball)
- a given play ends with the person with the ball is tackled or goes out of bounds or when its passed and not caught

In a simpler view, imagine having two bowls each with three colored balls. You pull a ball out blindfolded one at a time. Pull a red ball you win, a black ball you lose, and a yellow ball lets you pull again. However if you pull three yellow balls in a row you also lose. There are two bowls to choose from, one called run and one called pass, each has a different amount amount of red, yellow, and black balls.

Using this mentality we created a probability tree that includes all possibilities from this decision.

**Implementation**

`# get data ----------------------------------------------------------------`

`library(downloader)`

`fileUrl<-http://nflsavant.com/pbp_data.php?year=2014"`

`downloadfileUrl, dest="./data/data.pbp.2014.csv",mode="wb")`

`list.files("./data")`

data.pbp.2014 <- read.csv(“./data/data.pbp.2014.csv”)`# check data --------------------------------------------------------------`

str(data.pbp.2014)

`# 45k observations by 45 vars`

# 01 - GameId - integer - example: 2014090400 - date of game and two more digits

# 02 - GameDate - factor - example: 2014-09-04 - date of game

# 03 - Quarter - integer - example: 1 - quarter in game

# 04 - Minute - integer - example: 15 - minutes left in quarter

`# 05 - Second - integer - example: 0 - seconds left in quarter`

# 06 - OffenseTeam - factor - example: ARI - offensive team

# 07 - DefenseTeam - factor - example: ARI - offensive team

# 08 - Down - integer - example: 1 - down; not sure ab 0?

# 09 - ToGo - integer - example: 10 - distance to go; not sure ab 0?

# 10 - YardLine - integer - example: 35 - distance to go; not sure ab 0? *******

# 11 - X - logical - example: ?? - not sure

# 12 - SeriesFirstDown - integer - example: 1 - series 1st down

# 13 - X.1 - logical - example: ?? - not sure

# 14 - NextScore - integer - example: 0 - check this ***************************

# 15 - Description - factor - example: "D.CARR.." - description

# 16 - TeamWin - integer - example: 0 - unclear - ******************************

# 17 - X.2 - logical - ?? - not sure

# 18 - X.3 - logical - ?? - not sure

# 19 - SeasonYear - integer - example: 2014 - season year

# 20 - Yards - integer - example: 0 - yards from result of play? ***************

# 21 - Formation - factor - example: SHOTGUN - simple formation on play

# 22 - PlayType

# 23 - IsRush - integer - example: 0 - whether rush play or not ****************

# 24 - IsPass - integer - example: 0 - whether pass play or not ****************

# 25 - IsIncomplete

# 26 - IsTouchdown - integer - example: 0 - whether play was touchdown or not***

# 27 - PassType

# 28 - IsSack

# 29 - IsChallenge

# 30 - IsChallengeReversed

# 31 - Challenger

# 32 - IsMeasurement

# 33 - IsInterception - integer - example: 0 - whether play was interception ***

# 34 - IsFumble - integer - example: example: 0 - whether play was fumble ******

# 35 - IsPenalty - integer - example: example: 0 - whether play was penalty ****

# 36 - IsTwoPointConversion

# 37 - IsTwoPointConversionSuccessful

# 38 - RushDirection

# 39 - YardLineFixed - integer - example: 35 - 0-50 yardline

# 40 - YardLineDirection - factor - example: OPP - which side of field

# 41 - IsPenaltyAccepted - integer - example: 0 - penalty accepted or not ******

# 42 - PenaltyTeam - factor - example: ARI - why 33 levels

# 43 - IsNoPlay - integer - example: 0 - not sure what this means

# 44 - PenaltyType - factor - example: BLOCKED INTO PUNTER

# 45 - PenaltyYards - integer - example: 5 - yards from penalty

Then we sum amount of events that met all the criteria. For each, pass and run, we needed the total amount of attempts, the amount of touchdowns (successes), and amount of turnovers (either fumble or interception).

`# probability of outcomes`

`-------------------------------------------------`

`n.rush <- nrow(data.pbp.2014[`

data.pbp.2014$YardLineFixed == 1 &

```
```data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsRush == 1,])

n.rush.td <- nrow(data.pbp.2014[

data.pbp.2014$YardLineFixed == 1

&

data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsRush == 1 &

data.pbp.2014$IsTouchdown == 1,])

n.rush.no.td <- nrow(data.pbp.2014[

data.pbp.2014$YardLineFixed == 1 &

data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsRush == 1 &

data.pbp.2014$IsTouchdown == 0,])

n.rush.fumble <- nrow(data.pbp.2014[

data.pbp.2014$YardLineFixed == 1 &

data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsRush == 1 &

data.pbp.2014$IsFumble == 1,])

round(n.rush.td / n.rush, digits=3)

round(n.rush.no.td / n.rush, digits=3)

round(n.rush.fumble / n.rush, digits=4)

n.pass <- nrow(data.pbp.2014[

data.pbp.2014$YardLineFixed == 1 &

data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsPass == 1,])

n.pass.td <- nrow(data.pbp.2014[

data.pbp.2014$YardLineFixed == 1 &

data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsPass == 1 &

data.pbp.2014$IsTouchdown == 1,])

n.pass.no.td <- nrow(data.pbp.2014[

data.pbp.2014$YardLineFixed == 1 &

data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsPass == 1 &

data.pbp.2014$IsTouchdown == 0,])

`n.pass.interception <- nrow(data.pbp.2014[`

data.pbp.2014$YardLineFixed == 1 &

data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsPass == 1 &

data.pbp.2014$IsInterception == 1,])

Lastly we calculate the success and failure chances by dividing those by the total amount of attempts.

`round(n.pass.td / n.pass, digits=3)`

round(n.pass.no.td / n.pass, digits=3)

round(n.pass.interception / n.pass, digits=4)

```
```# > round(n.rush.td / n.rush, digits=3)

# [1] 0.563

# > round(n.rush.no.td / n.rush, digits=3)

# [1] 0.437

# > round(n.rush.fumble / n.rush, digits=4)

# [1] 0.0101

`# > round(n.pass.td / n.pass, digits=3)`

# [1] 0.579

# > round(n.pass.no.td / n.pass, digits=3)

# [1] 0.421

# > round(n.pass.interception / n.pass, digits=4)

># [1] 0

**Conclusion**

**leave a comment**for the author, please follow the link and comment on their blog:

**NYC Data Science Academy » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...