Calculating Custom Fantasy Football Projections for Your League using R

[This article was first published on Fantasy Football Analytics in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In prior posts, I have shown how to download fantasy football projections from ESPN, CBS, and  In this post, I will demonstrate how to take the projected points from these sources and calculate the projected points for your custom league given your league settings.  Calculating players’ projected points in your league will be important for picking the ideal team for your league.

The R Script

The R script for calculating custom fantasy football projections for your league is located at:

League settings

In the first portion of the script, we define (and can modify) your league settings.  Here are the settings for my fantasy league:
#Customize your league settings
passYdsMultiplier <- (1/25) #1 pt per 25 pass yds
passTdsMultiplier <- 4      #4 pts per pass td
passIntMultiplier <- -3     #-3 pts per INT
rushYdsMultiplier <- (1/10) #1 pt per 10 rush yds
rushTdsMultiplier <- 6      #6 pts per rush td
recYdsMultiplier <- (1/8)   #1 pt per 8 rec yds
recTdsMultiplier <- 6       #6 pts per rec td
twoPtsMultiplier <- 2       #2 pts per 2-point conversion (not included in ESPN or CBS projections)
fumlMultiplier <- -3        #-3 pts per fumble lost (not included in ESPN projections)

Calculating projected points for each source

We then take the projected stats for each of the categories above (e.g., passing yards, rushing yards) from each source and multiply them by the multiplier defined for your league above.  Here are the calculations for ESPN's projections:
projections$passYdsPts_espn <- projections$passYds_espn*passYdsMultiplier
projections$passTdsPts_espn <- projections$passTds_espn*passTdsMultiplier
projections$passIntPts_espn <- projections$passInt_espn*passIntMultiplier
projections$rushYdsPts_espn <- projections$rushYds_espn*rushYdsMultiplier
projections$rushTdsPts_espn <- projections$rushTds_espn*rushTdsMultiplier
projections$recYdsPts_espn <- projections$recYds_espn*recYdsMultiplier
projections$recTdsPts_espn <- projections$recTds_espn*recTdsMultiplier
projections$twoPts_espn <- projections$fumbles_espn*twoPtsMultiplier
projections$fumblesPts_espn <- projections$fumbles_espn*fumlMultiplier
The projected points for a given source is the linear (additive) combination of these point categories.  For example, we add the projected points from each of the above categories for ESPN's projections:
projections$projectedPts_espn <- rowSums(projections[,c("passYdsPts_espn","passTdsPts_espn","passIntPts_espn","rushYdsPts_espn","rushTdsPts_espn","recYdsPts_espn","recTdsPts_espn","twoPts_espn","fumblesPts_espn")], na.rm=T)

We complete each of these steps with the other sources (CBS and  Once we have the projected points for each source, we have a couple options: 1) We can compute a simple average across the sites' projections. 2) We can compute a weighted average where we weight the sources we trust more heavily in our average. 3) We can compute a latent variable that represents the common variance among the sources.  I tend not to trust individual sources of projections because they tend not to reliably outperform the average, so I will compute an average and latent variable (but see here if you want an example of a weighted average of fantasy projections).

Average across sources

To calculate an average of projections across sources, we first calculate an average of projected statistics for each of the categories across sources:
#Calculate average of categories
projections$passYds <- rowMeans(projections[,c("passYds_espn","passYds_cbs","passYds_nfl")], na.rm=TRUE)
projections$passTds <- rowMeans(projections[,c("passTds_espn","passTds_cbs","passTds_nfl")], na.rm=TRUE)
projections$passInt <- rowMeans(projections[,c("passInt_espn","passInt_cbs","passInt_nfl")], na.rm=TRUE)
projections$rushYds <- rowMeans(projections[,c("rushYds_espn","rushYds_cbs","rushYds_nfl")], na.rm=TRUE)
projections$rushTds <- rowMeans(projections[,c("rushTds_espn","rushTds_cbs","rushTds_nfl")], na.rm=TRUE)
projections$recYds <- rowMeans(projections[,c("recYds_espn","recYds_cbs","recYds_nfl")], na.rm=TRUE)
projections$recTds <- rowMeans(projections[,c("recTds_espn","recTds_cbs","recTds_nfl")], na.rm=TRUE)
projections$twoPts <- rowMeans(projections[,c("twoPts_espn","twoPts_cbs","twoPts_nfl")], na.rm=TRUE)
projections$fumbles <- rowMeans(projections[,c("fumbles_espn","fumbles_cbs","fumbles_nfl")], na.rm=TRUE)

Then we multiply the projected stats categories by the multiplier defined by our league settings:
#Calculate projected points for your league (avg of ESPN, CBS, and NFL projections)
projections$passYdsPts <- projections$passYds*passYdsMultiplier
projections$passTdsPts <- projections$passTds*passTdsMultiplier
projections$passIntPts <- projections$passInt*passIntMultiplier
projections$rushYdsPts <- projections$rushYds*rushYdsMultiplier
projections$rushTdsPts <- projections$rushTds*rushTdsMultiplier
projections$recYdsPts <- projections$recYds*recYdsMultiplier
projections$recTdsPts <- projections$recTds*recTdsMultiplier
projections$fumblesPts <- projections$fumbles*fumlMultiplier

The average projected points across sources is the sum of the points across categories (that have been averaged across sources):
rowSums(projections[,c("passYdsPts","passTdsPts","passIntPts","rushYdsPts","rushTdsPts","recYdsPts","recTdsPts","twoPts","fumblesPts")], na.rm=T)

Latent variables

Latent variables are helpful for calculating an unobserved variable based on the common variance among various indicator variables.  Latent variables tend to have stronger psychometric properties than simple average variables because they retain the common variance (thought to be "true" variance) and discard the unique variance (i.e., measurement error).  Examination of the correlations among the various sources suggests that they are highly correlated (rs > .89), suggesting that they are measuring the same thing, and that they can be combined in a latent variable.
> round(cor(projections[,c("projectedPts_espn","projectedPts_cbs","projectedPts_nfl","projectedPts")], use="pairwise.complete.obs"),2)
                 projectedPts_espn projectedPts_cbs projectedPts_nfl projectedPts
projectedPts_espn             1.00             0.94             0.90         0.98
projectedPts_cbs              0.94             1.00             0.89         0.97
projectedPts_nfl              0.90             0.89             1.00         0.93
projectedPts                  0.98             0.97             0.93         1.00

To compute the latent variable representing the common variance among the projections from ESPN, CBS, and, we use the factanal function to compute a factor analysis.  We want to keep 1 factor, and the factor scores represent each player's standardized value on the latent factor of projected points:
factor.analysis <- factanal(~projectedPts_espn + projectedPts_cbs + projectedPts_nfl, factors = 1, scores = "Bartlett", data=projections)
factor.scores <- factor.analysis$scores

We want to know each player's value on the latent metric, but the factor scores are standardized with a mean of 0 and a standard deviation of 1.  As a result, we have to rescale the values so that they are meaningful representation of the players' projected points.  To do that, we rescale the factor scores to have the same range as the average projections:
rescaleRange <- function(variable, minOutput, maxOutput){
  minObserved <- min(variable)
  maxObserved <- max(variable)
  values <- (maxOutput-minOutput)/(maxObserved-minObserved)*(variable-maxObserved)+maxOutput
#Rescale the factor scores to have the same range as the average projections data
projections$projectedPtsLatent <- as.vector(rescaleRange(variable=projectedPtsLatent, minOutput=0, maxOutput=max(projections$projectedPts)))

That's it! We now have projections for our league from ESPN, CBS, and, in addition to the average among their projections, and the latent combination of the three.  Here's a density plot showing the similarities among the distributions of projected points from the three different sources:
ggplot(densityData, aes(x=pointDensity, fill=sourceDensity)) + geom_density(alpha=.3) + xlab("Player's Projected Points") + ggtitle("Density Plot of Projected Points from 2012") + theme(legend.title=element_blank())

In my next post, I will compare the accuracy of last year's projections from ESPN, CBS, and, in addition to the accuracy of our average and latent variables.

To leave a comment for the author, please follow the link and comment on their blog: Fantasy Football Analytics in R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)