Weekly Variability Simulation of Fantasy Football Projections

Posted on July 20, 2014 by Isaac Petersen in R bloggers | 0 Comments

[This article was first published on Fantasy Football Analytics » R | Fantasy Football Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post, I show how to estimate players’ week-to-week variability in fantasy football points. In a prior post, I demonstrated how to calculate a player’s risk level, as defined by the variability of their projected points across sources. As a reader pointed out, another form of meaningful variability is week-to-week variability (in addition to variability across sources). Some fantasy statistics are more variable (TDs) than others (yards) from week to week. For players with a higher percentage of their projected points scored by TDs, their weekly points are likely to be more variable. Another example might be possession receivers vs deep threats. Possession receivers are likely more reliable from week-to-week than deep threats who are more boom-or-bust. Here, I use a simulation to estimate each player’s week-to-week variability in fantasy points.

The R Scripts

The R Script for the “Historical Weekly Variability” section is below:

https://github.com/dadrivr/FantasyFootballAnalyticsR/blob/master/R%20Scripts/Historical/Historical%20Weekly.R

The R Script for the “Weekly Variability Simulation” section is below:

https://github.com/dadrivr/FantasyFootballAnalyticsR/blob/master/R%20Scripts/Posts/Weekly%20Simulation.R

Historical Weekly Variability

In order to simulate players’ weekly fantasy points, we first must determine the distribution from which to sample for each player and statistical category (passing yards, rushing TDs, etc.). I chose to sample from a normal distribution, with each player’s weekly mean of the statistical category as the mean of his distribution. In other words, if Peyton Manning is projected to have 4800 passing yards this season, that equals an average of 300 yards per game (4800/16). Thus, for sampling Peyton Manning’s weekly passing yards, we can sample from a distribution with a mean of 300.

For a normal distribution, we have to specify a mean and standard deviation. What standard deviation should we use for Peyton Manning’s weekly passing yards? We could theoretically use Peyton Manning’s weekly variability from last season, but some players do not have statistics from last year. As a result, I chose to calculate the historical weekly variability for passing yards (and all other statistical categories) averaged across all players from the past three seasons. Then, we use the historical week-to-week standard deviation of players’ passing yards as the standard deviation of players’ sampling distributions for passing yards.

To do this, I scrape data from every week of the season (weeks 1-17) from Pro-Football-Reference for the past three seasons:

#Libraries library("XML") #Specify info to scrape years <- 2011:2013 weeks <- 17 #Scrape data qb <- list() rb <- list() rb1 <- list() rb2 <- list() rb3 <- list() wr <- list() wr1 <- list() wr2 <- list() wr3 <- list() wr4 <- list() wr5 <- list() wr6 <- list() pb <- txtProgressBar(min = 1, max = weeks, style = 3) for(i in 1:weeks){ setTxtProgressBar(pb, i) qb[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=pass_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=pass_att", sep=""), stringsAsFactors = FALSE)$stats rb1[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds", sep=""), stringsAsFactors = FALSE)$stats rb2[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds&order_by_asc=&offset=100", sep=""), stringsAsFactors = FALSE)$stats rb3[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rush_att&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rush_yds&order_by_asc=&offset=200", sep=""), stringsAsFactors = FALSE)$stats wr1[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&game_type=R&league_id=&team_id=&opp_id=&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_location=&game_result=&handedness=&is_active=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds", sep=""), stringsAsFactors = FALSE)$stats wr2[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=100", sep=""), stringsAsFactors = FALSE)$stats wr3[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=200", sep=""), stringsAsFactors = FALSE)$stats wr4[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=300", sep=""), stringsAsFactors = FALSE)$stats wr5[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=400", sep=""), stringsAsFactors = FALSE)$stats wr6[[i]] <- readHTMLTable(paste("http://www.pro-football-reference.com/play-index/pgl_finder.cgi?request=1&match=game&year_min=", head(years, 1), "&year_max=", tail(years, 1), "&season_start=1&season_end=-1&age_min=0&age_max=99&league_id=&team_id=&opp_id=&game_type=R&game_num_min=0&game_num_max=99&week_num_min=", i, "&week_num_max=", i, "&game_day_of_week=&game_month=&game_location=&game_result=&is_active=&handedness=&is_hof=&c1stat=rec&c1comp=gt&c1val=1&c2stat=&c2comp=gt&c2val=&c3stat=&c3comp=gt&c3val=&c4stat=&c4comp=gt&c4val=&order_by=rec_yds&order_by_asc=&offset=500", sep=""), stringsAsFactors = FALSE)$stats }

After cleaning and merging the data, I then put the data in the necessary form for calculating players’ weekly variability. This involves transforming the data from long form to wide form so that each week has a separate column:

#Long to Wide weeklyDataWide <- reshape(weeklyDataLong, timevar = c("week"), idvar = c("name","year"), direction = "wide", sep="")

Then I calculate the week-to-week standard deviation of each statistical category for every player in each season. Finally, I calculate a robust average across all players and seasons to get a general week-to-week standard deviation for each statistical category:

#Calculate week-to-week SD for each statistical category across same player/year combinations sdPassYds <- apply(weeklyDataWide[,grep("passYds", names(weeklyDataWide))], 1, sd) sdPassTds <- apply(weeklyDataWide[,grep("passTds", names(weeklyDataWide))], 1, sd) sdPassInt <- apply(weeklyDataWide[,grep("passInt", names(weeklyDataWide))], 1, sd) sdRushYds <- apply(weeklyDataWide[,grep("rushYds", names(weeklyDataWide))], 1, sd) sdRushTds <- apply(weeklyDataWide[,grep("rushTds", names(weeklyDataWide))], 1, sd) sdRec <- apply(weeklyDataWide[,grep("rec", names(weeklyDataWide))], 1, sd) sdRecYds <- apply(weeklyDataWide[,grep("recYds", names(weeklyDataWide))], 1, sd) sdRecTds <- apply(weeklyDataWide[,grep("recTds", names(weeklyDataWide))], 1, sd) sdVars <- data.frame(sdPassYds, sdPassTds, sdPassInt, sdRushYds, sdRushTds, sdRec, sdRecYds, sdRecTds) #Robust average of week-to-week SD for each statistical category sdAverage <- data.frame(t(apply(sdVars, 2, function(x) wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate)))

Here are the average week-to-week standard deviations for each statistical category:

Passing yards : 82.1 yards
Passing TDs: 0.8 TDs
Passing INTs: 0.7 INTs
Rushing yards: 11.3 yards
Rushing TDs: 0.4 TDs
Receptions: 11.1 receptions
Receiving yards: 15.9 yards
Receiving TDs: 0.4 TDs

Here are the density plots of the week-to-week standard deviations of each statistical category for the different players and seasons:

Weekly Variability Simulation

Now that we’ve calculated the historical week-to-week standard deviation for each statistical category, we can simulate players’ weekly performances for each statistical category using the sampling distribution of a) that player’s weekly mean and b) the historical week-to-week standard deviation for the relevant statistical category. In other words, for simulating Peyton Manning’s passing yards in the example earlier, we will sample from the distribution with a mean of 300 passing yards and a standard deviation of 82.1 yards. But the sampling has some constraints. First, the samples (weekly performances in each game) must sum to equal Manning’s projected passing yards for the season. Second, for some statistical categories (e.g., TDs), values can only be positive integers (e.g., you can’t score half a touchdown or negative touchdowns in a game). Here’s the function for taking ‘n’ samples from a distribution whose mean is ‘sum’/'n’ and whose standard deviation is ‘sd’, and that only include positive integers that sum to equal ‘sum’:

simulateIntegers <- function(n, sum, sd, pos.only = TRUE){ if(sum == 0 & pos.only == TRUE){ vec <- rep(0, n) } else{ vec <- rnorm(n, sum/n, sd) if (abs(sum(vec)) < 0.01) vec <- vec + 1 vec <- round(vec / sum(vec) * sum) deviation <- sum - sum(vec) for (. in seq_len(abs(deviation))){ vec[i] <- vec[i <- sample(n, 1)] + sign(deviation) } if (pos.only) while (any(vec < 0)){ negs <- vec < 0 pos <- vec > 0 vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1 vec[pos][i] <- vec[pos ][i <- sample(sum(pos ), 1)] - 1 } } vec }

Using this function, and plugging in the player’s season projection and the historical week-to-week standard deviation for each statistical category, we simulate all 16 games for 100 different seasons for each player and statistical category:

#Simulation simulations <- 100 games <- 16 passYds <- list() passTds <- list() passInt <- list() rushYds <- list() rushTds <- list() rec <- list() recYds <- list() recTds <- list() twoPts <- list() fumbles <- list() for(i in 1:simulations){ passYds[[i]] <- t(sapply(projections$passYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdPassYds), error=function(e) rep(NA, games)))) passTds[[i]] <- t(sapply(projections$passTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdPassTds), error=function(e) rep(NA, games)))) passInt[[i]] <- t(sapply(projections$passIntMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdPassInt), error=function(e) rep(NA, games)))) rushYds[[i]] <- t(sapply(projections$rushYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRushYds), error=function(e) rep(NA, games)))) rushTds[[i]] <- t(sapply(projections$rushTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRushTds), error=function(e) rep(NA, games)))) rec[[i]] <- t(sapply(projections$recMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRec), error=function(e) rep(NA, games)))) recYds[[i]] <- t(sapply(projections$recYdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRecYds), error=function(e) rep(NA, games)))) recTds[[i]] <- t(sapply(projections$recTdsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sdAverage$sdRecTds), error=function(e) rep(NA, games)))) twoPts[[i]] <- t(sapply(projections$twoPtsMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sd(c(rep(0, games - 1), 1))), error=function(e) rep(NA, games)))) fumbles[[i]] <- t(sapply(projections$fumblesMedian, function(x) tryCatch(simulateIntegers(n=games, sum=x, sd=sd(c(rep(0, games - 1), 1))), error=function(e) rep(NA, games)))) }

Then we calculate the weekly fantasy points for every player in all 100 seasons based on their simulated performances in each statistical category:

#Calculate fantasy points per week passYdsPts <- list() passTdsPts <- list() passIntPts <- list() rushYdsPts <- list() rushTdsPts <- list() recPts <- list() recYdsPts <- list() recTdsPts <- list() twoPtsPts <- list() fumblesPts <- list() fantasyPts <- list() for(i in 1:simulations){ passYdsPts[[i]] <- passYds[[i]] * passYdsMultiplier passTdsPts[[i]] <- passTds[[i]] * passTdsMultiplier passIntPts[[i]] <- passInt[[i]] * passIntMultiplier rushYdsPts[[i]] <- rushYds[[i]] * rushYdsMultiplier rushTdsPts[[i]] <- rushTds[[i]] * rushTdsMultiplier recPts[[i]] <- rec[[i]] * recMultiplier recYdsPts[[i]] <- recYds[[i]] * recYdsMultiplier recTdsPts[[i]] <- recTds[[i]] * recTdsMultiplier twoPtsPts[[i]] <- twoPts[[i]] * twoPtsMultiplier fumblesPts[[i]] <- fumbles[[i]] * fumlMultiplier fantasyPts[[i]] <- passYdsPts[[i]] + passTdsPts[[i]] + passIntPts[[i]] + rushYdsPts[[i]] + rushTdsPts[[i]] + recPts[[i]] + recYdsPts[[i]] + recTdsPts[[i]] + twoPtsPts[[i]] + fumblesPts[[i]] }

Finally, we calculate the week-to-week standard deviation of fantasy points for each player in all 100 seasons, and the average week-to-week standard deviation across seasons:

#Calculate SD of fantasy points per week sdWeeklyPts <- matrix(nrow=NROW(fantasyPts[[1]]), ncol=simulations) for(i in 1:simulations){ sdWeeklyPts[,i] <- apply(fantasyPts[[i]], 1, function(x) sd(x, na.rm=TRUE)) } #Calculate robust average of weekly SD projections$weeklySD <- apply(sdWeeklyPts, 1, function(x) tryCatch(wilcox.test(x, conf.int=TRUE, na.action="na.exclude")$estimate, error=function(e) median(x, na.rm=TRUE)))

Here are some players with high weekly variability in fantasy points according to our simulation:

Cam Newton
Robert Griffin III
Colin Kaepernick
Jamaal Charles
Matt Forte
Montee Ball
Cordarrelle Patterson
Percy Harvin
Alshon Jeffery
Tavon Austin

Conclusion

Simulating weekly variability can help you identify players who are more or less reliable from week to week. For example, possession receivers (who rely less on long gains and touchdowns) are more reliable than deep threats, who tend to boom or bust.

The post Weekly Variability Simulation of Fantasy Football Projections appeared first on Fantasy Football Analytics.

To leave a comment for the author, please follow the link and comment on their blog: Fantasy Football Analytics » R | Fantasy Football Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Weekly Variability Simulation of Fantasy Football Projections

The R Scripts

Historical Weekly Variability

Weekly Variability Simulation

Conclusion

Related

The R Scripts

Historical Weekly Variability

Weekly Variability Simulation

Conclusion

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)