Weekly Variability Simulation of Fantasy Football Projections

July 20, 2014
By

(This article was first published on Fantasy Football Analytics » R | Fantasy Football Analytics, and kindly contributed to R-bloggers)

In this post, I show how to estimate players’ week-to-week variability in fantasy football points.  In a prior post, I demonstrated how to calculate a player’s risk level, as defined by the variability of their projected points across sources.  As a reader pointed out, another form of meaningful variability is week-to-week variability (in addition to variability across sources).  Some fantasy statistics are more variable (TDs) than others (yards) from week to week.  For players with a higher percentage of their projected points scored by TDs, their weekly points are likely to be more variable. Another example might be possession receivers vs deep threats.  Possession receivers are likely more reliable from week-to-week than deep threats who are more boom-or-bust.  Here, I use a simulation to estimate each player’s week-to-week variability in fantasy points.

The R Scripts

The R Script for the “Historical Weekly Variability” section is below:

The R Script for the “Weekly Variability Simulation” section is below:

Historical Weekly Variability

In order to simulate players’ weekly fantasy points, we first must determine the distribution from which to sample for each player and statistical category (passing yards, rushing TDs, etc.).  I chose to sample from a normal distribution, with each player’s weekly mean of the statistical category as the mean of his distribution.  In other words, if Peyton Manning is projected to have 4800 passing yards this season, that equals an average of 300 yards per game (4800/16).  Thus, for sampling Peyton Manning’s weekly passing yards, we can sample from a distribution with a mean of 300.

For a normal distribution, we have to specify a mean and standard deviation.  What standard deviation should we use for Peyton Manning’s weekly passing yards?  We could theoretically use Peyton Manning’s weekly variability from last season, but some players do not have statistics from last year.  As a result, I chose to calculate the historical weekly variability for passing yards (and all other statistical categories) averaged across all players from the past three seasons.  Then, we use the historical week-to-week standard deviation of players’ passing yards as the standard deviation of players’ sampling distributions for passing yards.

To do this, I scrape data from every week of the season (weeks 1-17) from Pro-Football-Reference for the past three seasons:

``````#Libraries
library("XML")

#Specify info to scrape
years <- 2011:2013
weeks <- 17

#Scrape data
qb <- list()
rb <- list()
rb1 <- list()
rb2 <- list()
rb3 <- list()
wr <- list()
wr1 <- list()
wr2 <- list()
wr3 <- list()
wr4 <- list()
wr5 <- list()
wr6 <- list()

pb <- txtProgressBar(min = 1, max = weeks, style = 3)
for(i in 1:weeks){
setTxtProgressBar(pb, i)
qb[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=pass_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=pass_att", sep=""), stringsAsFactors = FALSE)\$stats
rb1[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds", sep=""), stringsAsFactors = FALSE)\$stats
rb2[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds&order_by_asc=&offset=100", sep=""), stringsAsFactors = FALSE)\$stats
rb3[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds&order_by_asc=&offset=200", sep=""), stringsAsFactors = FALSE)\$stats
wr1[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds", sep=""), stringsAsFactors = FALSE)\$stats
wr2[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=100", sep=""), stringsAsFactors = FALSE)\$stats
wr3[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=200", sep=""), stringsAsFactors = FALSE)\$stats
wr4[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=300", sep=""), stringsAsFactors = FALSE)\$stats
wr5[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=400", sep=""), stringsAsFactors = FALSE)\$stats
wr6[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=500", sep=""), stringsAsFactors = FALSE)\$stats
}``````

After cleaning and merging the data, I then put the data in the necessary form for calculating players’ weekly variability.  This involves transforming the data from long form to wide form so that each week has a separate column:

``````#Long to Wide
weeklyDataWide <- reshape(weeklyDataLong,
timevar = c("week"),
idvar = c("name","year"),
direction = "wide",
sep="")``````

Then I calculate the week-to-week standard deviation of each statistical category for every player in each season.  Finally, I calculate a robust average across all players and seasons to get a general week-to-week standard deviation for each statistical category:

``````#Calculate week-to-week SD for each statistical category across same player/year combinations
sdPassYds <- apply(weeklyDataWide[,grep("passYds", names(weeklyDataWide))], 1, sd)
sdPassTds <- apply(weeklyDataWide[,grep("passTds", names(weeklyDataWide))], 1, sd)
sdPassInt <- apply(weeklyDataWide[,grep("passInt", names(weeklyDataWide))], 1, sd)
sdRushYds <- apply(weeklyDataWide[,grep("rushYds", names(weeklyDataWide))], 1, sd)
sdRushTds <- apply(weeklyDataWide[,grep("rushTds", names(weeklyDataWide))], 1, sd)
sdRec <- apply(weeklyDataWide[,grep("rec", names(weeklyDataWide))], 1, sd)
sdRecYds <- apply(weeklyDataWide[,grep("recYds", names(weeklyDataWide))], 1, sd)
sdRecTds <- apply(weeklyDataWide[,grep("recTds", names(weeklyDataWide))], 1, sd)

sdVars <- data.frame(sdPassYds, sdPassTds, sdPassInt, sdRushYds, sdRushTds, sdRec, sdRecYds, sdRecTds)

#Robust average of week-to-week SD for each statistical category
sdAverage <- data.frame(t(apply(sdVars, 2, function(x) wilcox.test(x, conf.int=TRUE, na.action="na.exclude")\$estimate)))``````

Here are the average week-to-week standard deviations for each statistical category:

• Passing yards : 82.1 yards
• Passing TDs: 0.8 TDs
• Passing INTs: 0.7 INTs
• Rushing yards: 11.3 yards
• Rushing TDs: 0.4 TDs
• Receptions: 11.1 receptions
• Receiving yards: 15.9 yards
• Receiving TDs: 0.4 TDs

Here are the density plots of the week-to-week standard deviations of each statistical category for the different players and seasons:

Weekly Variability Simulation

Now that we’ve calculated the historical week-to-week standard deviation for each statistical category, we can simulate players’ weekly performances for each statistical category using the sampling distribution of a) that player’s weekly mean and b) the historical week-to-week standard deviation for the relevant statistical category.  In other words, for simulating Peyton Manning’s passing yards in the example earlier, we will sample from the distribution with a mean of 300 passing yards and a standard deviation of 82.1 yards.  But the sampling has some constraints.  First, the samples (weekly performances in each game) must sum to equal Manning’s projected passing yards for the season.  Second, for some statistical categories (e.g., TDs), values can only be positive integers (e.g., you can’t score half a touchdown or negative touchdowns in a game).  Here’s the function for taking ‘n’ samples from a distribution whose mean is ‘sum’/’n’ and whose standard deviation is ‘sd’, and that only include positive integers that sum to equal ‘sum’:

``````simulateIntegers <- function(n, sum, sd, pos.only = TRUE){
if(sum == 0 & pos.only == TRUE){
vec <- rep(0, n)
} else{
vec <- rnorm(n, sum/n, sd)
if (abs(sum(vec)) < 0.01) vec <- vec + 1
vec <- round(vec / sum(vec) * sum)
deviation <- sum - sum(vec)
for (. in seq_len(abs(deviation))){
vec[i] <- vec[i <- sample(n, 1)] + sign(deviation)
}
if (pos.only) while (any(vec < 0)){
negs <- vec < 0
pos  <- vec > 0
vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1
vec[pos][i]  <- vec[pos ][i <- sample(sum(pos ), 1)] - 1
}
}
vec
}``````

Using this function, and plugging in the player’s season projection and the historical week-to-week standard deviation for each statistical category, we simulate all 16 games for 100 different seasons for each player and statistical category:

``````#Simulation
simulations <- 100
games <- 16

passYds <- list()
passTds <- list()
passInt <- list()
rushYds <- list()
rushTds <- list()
rec <- list()
recYds <- list()
recTds <- list()
twoPts <- list()
fumbles <- list()

for(i in 1:simulations){
passYds[[i]] <- t(sapply(projections\$passYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage\$sdPassYds), error=function(e) rep(NA, games))))
passTds[[i]] <- t(sapply(projections\$passTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage\$sdPassTds), error=function(e) rep(NA, games))))
passInt[[i]] <- t(sapply(projections\$passIntMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage\$sdPassInt), error=function(e) rep(NA, games))))
rushYds[[i]] <- t(sapply(projections\$rushYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage\$sdRushYds), error=function(e) rep(NA, games))))
rushTds[[i]] <- t(sapply(projections\$rushTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage\$sdRushTds), error=function(e) rep(NA, games))))
rec[[i]] <- t(sapply(projections\$recMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage\$sdRec), error=function(e) rep(NA, games))))
recYds[[i]] <- t(sapply(projections\$recYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage\$sdRecYds), error=function(e) rep(NA, games))))
recTds[[i]] <- t(sapply(projections\$recTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage\$sdRecTds), error=function(e) rep(NA, games))))
twoPts[[i]] <- t(sapply(projections\$twoPtsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sd(c(rep(0, games - 1), 1))), error=function(e) rep(NA, games))))
fumbles[[i]] <- t(sapply(projections\$fumblesMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sd(c(rep(0, games - 1), 1))), error=function(e) rep(NA, games))))
}``````

Then we calculate the weekly fantasy points for every player in all 100 seasons based on their simulated performances in each statistical category:

``````#Calculate fantasy points per week
passYdsPts <- list()
passTdsPts <- list()
passIntPts <- list()
rushYdsPts <- list()
rushTdsPts <- list()
recPts <- list()
recYdsPts <- list()
recTdsPts <- list()
twoPtsPts <- list()
fumblesPts <- list()
fantasyPts <- list()

for(i in 1:simulations){
passYdsPts[[i]] <- passYds[[i]] * passYdsMultiplier
passTdsPts[[i]] <- passTds[[i]] * passTdsMultiplier
passIntPts[[i]] <- passInt[[i]] * passIntMultiplier
rushYdsPts[[i]] <- rushYds[[i]] * rushYdsMultiplier
rushTdsPts[[i]] <- rushTds[[i]] * rushTdsMultiplier
recPts[[i]] <- rec[[i]] * recMultiplier
recYdsPts[[i]] <- recYds[[i]] * recYdsMultiplier
recTdsPts[[i]] <- recTds[[i]] * recTdsMultiplier
twoPtsPts[[i]] <- twoPts[[i]] * twoPtsMultiplier
fumblesPts[[i]] <- fumbles[[i]] * fumlMultiplier

fantasyPts[[i]] <- passYdsPts[[i]] + passTdsPts[[i]] + passIntPts[[i]] + rushYdsPts[[i]] + rushTdsPts[[i]] + recPts[[i]] + recYdsPts[[i]] + recTdsPts[[i]] + twoPtsPts[[i]] + fumblesPts[[i]]
}``````

Finally, we calculate the week-to-week standard deviation of fantasy points for each player in all 100 seasons, and the average week-to-week standard deviation across seasons:

``````#Calculate SD of fantasy points per week
sdWeeklyPts <- matrix(nrow=NROW(fantasyPts[]), ncol=simulations)

for(i in 1:simulations){
sdWeeklyPts[,i] <- apply(fantasyPts[[i]], 1, function(x) sd(x, na.rm=TRUE))
}

#Calculate robust average of weekly SD
projections\$weeklySD <- apply(sdWeeklyPts, 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")\$estimate, error=function(e) median(x, na.rm=TRUE)))``````

Here are some players with high weekly variability in fantasy points according to our simulation:

• Cam Newton
• Robert Griffin III
• Colin Kaepernick
• Jamaal Charles
• Matt Forte
• Montee Ball
• Cordarrelle Patterson
• Percy Harvin
• Alshon Jeffery
• Tavon Austin

Conclusion

Simulating weekly variability can help you identify players who are more or less reliable from week to week.  For example, possession receivers (who rely less on long gains and touchdowns) are more reliable than deep threats, who tend to boom or bust.

The post Weekly Variability Simulation of Fantasy Football Projections appeared first on Fantasy Football Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...