I did the Two Castles Run today; it’s a 10km race between Warwick and Kenilworth castles. The organizers were very quick to put the results online and even went the extra mile of offering them as a CSV file. It was therefore very tempting to launch R and see what the distribution looked like (and how I fared compared to the rest of the runners).
After a quick R script to read and parse the data:
1 2 3 4 5 6 7 8 |
library(ggplot2) results<-read.csv("2011TwoCastlesRun.csv") results$Minutes<-sapply(as.character(results$ChipTime), FUN=function(s) sum(as.integer(strsplit(s,':')[[1]])*c(60,1,1/60))) summary(results$Minutes[results$M.F=="M"]) p<-ggplot(results,aes(Minutes,colour=M.F))+geom_density() print(p) print(results[results$Bib==2474,]) |
the distribution of the results (in minutes) looks like this:

As expected, men are faster on average than women but it’s funny to see how similar the two curves are; they even have the same small bump after the median. I wonder what makes those bumps.
My time today was 48’29 (or 48.4833 minutes), which places me at the 740th position. How good is that? Well,
summary(results$Minutes) Min. 1st Qu. Median Mean 3rd Qu. Max. 32.08 48.72 55.13 55.86 61.46 99.50
So I’m in the first quartile!
But wait, looking at men only:
summary(results$Minutes[results$M.F=="M"]) Min. 1st Qu. Median Mean 3rd Qu. Max. 32.08 46.68 51.85 52.84 57.62 99.50
I’m not any more. Still closer to the quartile than the median though!

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).