(This article was first published on

**Fear and Loathing in Data Science**, and kindly contributed to R-bloggers)With two weeks of NFL football under our belts, it is time to start peaking under the proverbial hood at some of the statistics. What better way than with R? If you want the best stats out there, I recommend the website http://www.advancednflstats.com/ . In order to understand the variables you will need to spend some time looking at the glossary. An excellent in-depth companion book to these advanced statistics is Mathletics, authored by Wayne Winston of Indiana University. Wayne also publishes a blog http://waynewinston.com/wordpress/. As such, I’m not going to get into the nitty gritty of these variables.

I’ve downloaded the Quarterback stats through week 2 and will do some simple data visualization, a scatterplot matrix and a correlation heatmap. This is some simple code to get you on your way to multivariate visualization.

> str(qb) #structure of the data named “qb”

‘data.frame’: 33 obs. of 18 variables:

$ Rank : int 1 2 3 4 5 6 7 8 9 10 …

$ Player : Factor w/ 33 levels “1-C.Newton”,”10-E.Manning”,..: 13 22 8 11 31 28 12 18 7 5 …

$ Team : Factor w/ 32 levels “ARZ”,”ATL”,”BLT”,..: 10 6 12 26 20 24 17 4 14 16 …

$ G : int 2 2 2 2 2 2 2 2 2 2 …

$ WPA : num 1.03 1.02 0.91 0.85 0.82 0.76 0.64 0.53 0.53 0.51 …

$ EPA : num 41.2 6.3 42.1 41.5 25.7 10.4 14.6 2.1 19.5 6.4 …

$ WPA.G : num 0.52 0.51 0.46 0.43 0.41 0.38 0.32 0.27 0.27 0.26 …

$ EPA.P : num 0.44 0.08 0.48 0.48 0.28 0.12 0.17 0.03 0.23 0.07 …

$ SR… : num 55.9 55.3 61.4 55.2 50.5 50 51.7 45.5 52.4 42.7 …

$ Att : int 85 72 79 76 81 62 72 66 66 70 …

$ Cmp : int 57 49 55 50 52 39 47 45 43 42 …

$ Cmp. : num 67.1 68.1 69.6 65.8 64.2 62.9 65.3 68.2 65.2 60 …

$ PassYds: int 769 532 813 614 679 631 591 446 499 396 …

$ Sk : int 3 1 6 3 6 4 9 1 7 5 …

$ SkYds : int 17 8 50 18 42 29 39 9 37 26 …

$ Int : int 0 3 1 1 3 0 1 1 1 0 …

$ X.Deep : num 17.6 15.3 17.7 22.4 24.7 30.6 12.5 21.2 25.8 8.6 …

$ AYPA : num 8.5 5.3 8.4 7 5.8 9.1 6.3 5.9 5.7 4.9 …

Of the 18 variables, 16 are continuous, but we not concerned with “Rank” (at least not in week2) and”G”, which is number of games played.

> pairs(qb[ ,5:18]) #base package scatterplot matrix

Yawn!

We could improve this with more code, but it still just won’t “pop” visually. An option would be to use the lattice package, which I describe in a previous post. However, I’m intrigued by heatmaps, in particular as a way to portray correlations.

For this, you will need to load the ggplot2 and reshape2 packages.

> library(ggplot2)

> library(reshape2)

> # simple code to create a correlation data set and put it into a heatmap

> corqb = cor(qb[ ,5:18])

> qplot(x=Var1, y=Var2, data=melt(cor(corqb)), fill=value, geom=”tile”) #Note: depending on your system, you may need to use X1 and X2 in place of Var1 and Var2

Let’s take a look at a very simple correlation on this chart. Find the variables “Sk” and “SkYds” and look at their high level of correlation. This should be no surprise as Sk is for the number of times sacked and yes, you guessed it, SkYds is the total yards lost as a result of those sacks.

Let’s look at QB rank, sacks, yards lost by sacks and interceptions

> corqb2 = qb[c(1,14,15,16)]

> qplot(x=Var1, y=Var2, data=melt(cor(corqb2)), fill=value, geom=”tile”)

And, here are the correlation numbers…

> cor(corqb2)

Rank Sk SkYds Int

Rank 1.0000000 0.27489633 0.3330972 0.32814607

Sk 0.2748963 1.00000000 0.9117308 0.06699875

SkYds 0.3330972 0.91173078 1.0000000 0.13743870

Int 0.3281461 0.06699875 0.1374387 1.00000000

At this point in the season, the QB rank is not highly correlated with these bad things happening. It will be interesting to see this change as the season progresses.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Fear and Loathing in Data Science**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...