Here you will find daily news and tutorials about R, contributed by over 750 bloggers.
There are many ways to follow us - By e-mail:On Facebook: If you are an R blogger yourself you are invited to add your own R content feed to this site (Non-English R bloggers should add themselves- here)

I recently submitted this blog to R-bloggers, which aggregates R-related blog posts. It’s a fantastic site and has been invaluable to me as I’ve learned R. One of my favorite kinds of articles is the hands-on, “hello world”-style weekend project that dips into a topic/technology, so here’s my first attempt at one in this style.

First, some background: I’ve been working with Greg on a project that analyzes the results of two-person contests. An important part of the problem is comparing different ranking systems that can adjust for the strength of the opponent (e.g., Elo rating system, TrueSkill, Glicko, etc.). As I understand it, all of these systems are working around the intractability of treating this as a purely Bayesian solution and try to deal with things like trends in ability, the distribution of the unobserved component, etc.

We’re still collecting data from a pilot, but in the interim, I wanted to start getting my feet wet with some real competition data. Sports statistics provide a readily available source of competition data, so my plan was:

Pull some data on NFL games on the 2011 season to date.

Fit a simple model that produces a rank ordering of teams.

Pull data on ESPN’s PowerRanking of NFL teams (based on votes by their columnists), using the XML package.

Make a comparison plot, showing how the two ranks compare, using ggplot2.

For the model, I wanted something really simple (hoping no one from FootballOutsiders is reading this). In my model, the difference in scores between the two teams is simply the difference in their “abilities,” plus an error term:

where the alpha’s are team-and-venue (e.g., home or away) specific random effects. For our actual rating, we can order teams based on the sum of their estimate home and away effects, i.e.:

Estimating the 32 x 2 parameters—given how little data we actually have—would probably lead to poor results. Instead, I used the excellent lme4 package which approximates a Bayesian estimation where we start with a prior that the alpha parameters are normally distributed.

Putting the last thing first, here’s the result of 4), comparing my “homebrew” ranking to the ESPN ranking, as of Week 5 (before the October 9th games):

No real comment on my model other than it thinks (a) that ESPN vastly overrates the Chargers and (b) more highly of the Ravens.

The code for all the steps is posted below, with explanatory comments:

Related

To leave a comment for the author, please follow the link and comment on their blog: Online Labor.