Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here’s a timely guest entry from Jeffrey Witmer (Oberlin College).

As the “Deflate Gate” saga was unfolding, Warren Sharp analyzed “touches per fumble” for NFL teams before and after 2006, when a rule was changed so that teams playing on the road could provide their own footballs (http://www.sharpfootballanalysis.com/blog/). Sharp noted that, for whatever reason, the Patriots went from being a typical team, as regards fumbling, to a team with a very low fumble rate. Rather than rely on the data the Sharp provides at his website, I choose to collect and analyze some data on my own. I took a random sample of 30 games played by New England and 30 other games. For each game, I recorded all rushing and passing plays (except for QB kneels), but excluded kicking plays (the NFL, rather than the teams, provides special footballs for those plays). (Data source: http://www.pro-football-reference.com/play-index/play_finder.cgi.) I also recorded the weather for the game. (Data source: http://www.wunderground.com/history/.) Once I had the data (in a file that I called AllBig, which can be downloaded from http://www.amherst.edu/~nhorton/AllBig.csv), I noted whether or not there was a fumble on each play, aided by the grep() command:
grep("Fumb", AllBig$Detail, ignore.case=TRUE)  I labeled each play as Late or not according to whether it happened after the rule change: AllBig$Late <- ifelse(AllBig$Year > 2006, 1, 0)  Now for the analysis. My data set has 7558 plays including 145 fumbles (1.9%). I used the mosaic package and the tally() command to see how often teams other than the Patriots fumble: require(mosaic) tally(~Fumble+Late, data=filter(AllBig,Pats==0)) Late Fumble 0 1 0 2588 2919 1 54 65  Then I asked for the data in proportion terms: tally(Fumble~Late, data=filter(AllBig,Pats==0))  and got  Late Fumble 0 1 0 0.9796 0.9782 1 0.0204 0.0218  For non-Pats there is a tiny increase in fumbles. This can be displayed graphically using a mosaiplot (though it’s not a particularly compelling figure). mosaicplot(Fumble~Late, data=filter(AllBig,Pats==0)) Repeating this for the Patriots shows a different picture: tally(~Fumble+Late, data=filter(AllBig,Pats==1)) Late Fumble 0 1 0 996 910 1 19 7 tally(Fumble~Late, data=filter(AllBig,Pats==1)) Late Fumble 0 1 0 0.98128 0.99237 1 0.01872 0.00763  I fit a logistic regression model with the glm() command: glm(Fumble~Late*Pats, family=binomial, data=AllBig) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.8697 0.1375 -28.14  I wanted to control for any weather effect, so I coded the weather as Bad if it was raining or snowing and good if not. This led to a model that includes BadWeather and Temperature – which turn out not to make much of a difference: AllBig$BadWeather <- ifelse(AllBig\$Weather %in% c("drizzle","rain","snow"), 1, 0)

glm(formula = Fumble ~ BadWeather + Temp + Late * Pats,
family = binomial, data = AllBig)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.23344    0.43164   -9.81   
Because there was suspicion that something changed starting in 2007 I added a three-way interaction:
glm(formula = Fumble ~ BadWeather + Temp + IsAway * Late * Pats,
family = binomial, data = AllBig)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)      -4.51110    0.47707   -9.46   
There is some evidence here that the Patriots fumble less than the rest of the NFL and that things changed in 2007. The p-values above are based on asymptotic normality, but there is a cleaner and easier way to think about the Patriots’ fumble rate. I wrote a short simulation that mimics something I do in my statistics classes, where I use a physical deck of cards to show what each step in the R simulation is doing.
#Simulation of deflategate data null hypothesis
Late = rep(1,72)  #creates 72 late fumbles
Early = rep(0,73)   #creates 73 early fumbles
alldata = append(Late,Early)   #puts the two groups together
table(alldata)  #check to see that we have what we want

cards =1:length(alldata)  # creates 145 cards, one "ID number" per fumble

FumbleLate = NULL  # initializes a vector to hold the results
for (i in 1:10000){# starts a loop that will be executed 10,000 times
cardsgroup1 = sample(cards,119, replace=FALSE) # takes a sample of 119 cards
cardsgroup2 = cards[-cardsgroup1]  # puts the remaining cards in group 2
NEPats = (alldata[cardsgroup2])  #reads the values of the cards in group 2
FumbleLate[i] = sum(NEPats)  # counts NEPats late fumbles (the only stat we need)
}

table(FumbleLate) #look at the results
hist(FumbleLate, breaks=seq(2.5,23.5)) #graph the results

sum(FumbleLate 
Additional note: kudos to Steve Taylor for the following graphical depiction of the interaction.

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.