I did the Two Castles Run today; it’s a 10km race between Warwick and Kenilworth castles. The organizers were very quick to put the results online and even went the extra mile of offering them as a CSV file. It was therefore very tempting to launch R and see what the distribution looked like (and how I fared compared to the rest of the runners).
After a quick R script to read and parse the data:
1 2 3 4 5 6 7 8
library(ggplot2) results<-read.csv("2011TwoCastlesRun.csv") results$Minutes<-sapply(as.character(results$ChipTime), FUN=function(s) sum(as.integer(strsplit(s,':')[])*c(60,1,1/60))) summary(results$Minutes[results$M.F=="M"]) p<-ggplot(results,aes(Minutes,colour=M.F))+geom_density() print(p) print(results[results$Bib==2474,])
the distribution of the results (in minutes) looks like this:
As expected, men are faster on average than women but it’s funny to see how similar the two curves are; they even have the same small bump after the median. I wonder what makes those bumps.
My time today was 48’29 (or 48.4833 minutes), which places me at the 740th position. How good is that? Well,
summary(results$Minutes) Min. 1st Qu. Median Mean 3rd Qu. Max. 32.08 48.72 55.13 55.86 61.46 99.50
So I’m in the first quartile!
But wait, looking at men only:
summary(results$Minutes[results$M.F=="M"]) Min. 1st Qu. Median Mean 3rd Qu. Max. 32.08 46.68 51.85 52.84 57.62 99.50
I’m not any more. Still closer to the quartile than the median though!