Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Inspired by a post by a R-blogger my interest was piqued to examine the runs in my athletic club. Therefore, I started R and analysed he LAC Degerloch Volkslauf 2010; a 10km race near Stuttgart-Hoffeld. Next lines, I present this statistical examination. The data can be found at: data.

Firstly, I converted the data file into a CSV file and wrote adapted a R script for reading and converting the data:

require(ggplot2)
results <- read.csv("Volkslauf10km.csv", sep = ";")
FUN <- function(s) {
sum(as.integer(strsplit(s,':')[[1]])*c(60,1,1/60))
}
results$Minuten <- sapply(as.character(results$Ergebnis), FUN)
results$Geschlecht <- "Männer" results$Geschlecht[grep("W", results\$AK)] <- "Frauen"


Next, I divided into men and women and plotted the age against the time.

ggplot(results, aes(Jhg, Minuten)) + theme_bw()
+ geom_point() + facet_wrap(~ Geschlecht)
+ geom_smooth()+ xlab("Jahrgang")
+ ylab("Minuten")


By the last picture, one can suppose that men are faster on average than women and there may be an influence of the age. So we additionally assign the people to decades.

ggplot(results, aes(Minutes)) + theme_bw()
+ geom_histogram(binwidth = 2)
+ ylab("Anzahl") + xlab("Minuten")


It seems to be that men are on average faster than women. Another interesting matter is that we have 1970 are only low represented in comparison to the 1980s and 1960s. Maybe this is an effect that has career or family reasons.

On the following lines I compare the performance of the different decades.

ggplot(results, aes(Minutes)) + theme_bw()
+ geom_histogram(binwidth = 2)