Winning a Marathon (Part 2)

December 21, 2014

(This article was first published on More or Less Numbers, and kindly contributed to R-bloggers)

In a previous post I looked at a data set published by the AARRS that provides a lot of data on marathons around the world and specifically the winning times of every* race.  After spending a bit more time with the data there are a few more things we can take from this data that may be more helpful for personal use.

As mentioned before, the data includes ultra-marathons, trail-runs, etc.  In an effort to extract those to get only road races I’ve filtered the data to include only those races with at least 200 participants (male/female so 200 male participants at least or 200 female participants).  Still there are some non-road races in the data that have 200+ participants, but far less than before.  So, is the data totally “cleaned” of these races, no.  But, I think this gets us closer to the finishing time(s) people are running to win “normal” marathon road races.

In this case the average winning time is about 2:35:00 for male winners.  We can assume that this would come down slightly with a few more of the ultra-races stripped out.  You can see different race names as you put your cursor over the point (thanks!).  This is potentially helpful for finding a race to win that’s within your race time.  In the past 10 years the times haven’t changed dramatically (contrary to the graph that included all marathon and ultra distances).  Certainly more races were available the past few years than those before, but it seems that those races are all run just as fast as the others.  
Female winning times have also stayed consistent over the past 10 years for races with more than 200 finishers.  

The average time for Female winners is around 3:02:00 for the last 10 years.  Again, much lower time than had we included all races in the data set without some filtering.

These graphs were only of races in the US.  In general, without having personal knowledge of the race, (terrain, temperature, organization, etc.) marathon difficulty is difficult to measure objectively.  I don’t know of any “difficulty index” for marathons (let me know if you know of one), which is why starting with the winning times of races is a good place to start when considering racing with the potential to win.  

To leave a comment for the author, please follow the link and comment on their blog: More or Less Numbers. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)