Boston Elite Field 2015

April 19, 2015

(This article was first published on More or Less Numbers, and kindly contributed to R-bloggers)

Last year I posted about how chances of a non-African country winning the Boston Marathon seemed to be good because of the widening interval of winning times (more recently there had been some historically “slower” races and some historically “faster” ones) and this actually happened.   Meb Kflezighi ran a remarkable race and was widely celebrated as he represented the US in a race more recently dominated by African countries.  His time for winning the race was obviously the fastest, but others in the field had faster PRs.  Because of the variation in winning times my conclusion has been that this provides opportunities for certain runners representing non-African countries to contest the race well.

The amount of participants from Africa in the elite field clearly increases the likelihood that the winner represents an African country.  The runners in the elite field mostly fall into or below the confidence interval shown in the graph above with the slight exception of Matt Tegenkamp whose PR for the marathon is 2:12 ish, just above where this statistical measurement would encompass.  It is clear that once again the elite field is dominated by African runners who are putting up some really impressive PRs.

And yet, with the difference in PRs, last year there was a similar dynamic.  Dennis Kimetto comes to the race with a 2:03 PR and Meb Kflezighi wins the Boston Marathon having run a 2:09 PR previously.  Thus we have another great story this year.  Incredible athletes, some of whom have in the past run much faster than others.  And yet, who can tell what will happen race day.

But why try?  Why did Meb think he could beat someone who in marathon terms could go somewhere he could not?  More broadly, why do we love these events?  Why should Matt Tegankamp attempt to rival someone who would be 2 miles ahead of him on each of their best days?  Variance.  Within these elite athletes there is the notion that on any given day, the guy next to you could be at his best or worst.  As spectators, we’re drawn to variance…we love possibilities of things not turning out predictably, or that there is variation in what we assume to be true.  Athletes place their hopes in this, that they could run their absolute best and others may not.  Confidence intervals tell the story of variance, that statistically we can’t know for certain.  I think this year yet again, we could see this same variance play out.  The athlete that doesn’t have the fastest PR runs their best despite the odds.  This is what makes a great race and what we could see again tomorrow.

To leave a comment for the author, please follow the link and comment on their blog: More or Less Numbers. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)