Pledging My Time III

September 8, 2018
By

(This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers)

I’ve previously crunched times for local Half and Full Marathons here on quantixed. Last weekend was the Kenilworth Half Marathon (2018) over a new course. I thought I’d have a look at the distributions of times and paces of the runners. The times are available here. If the Time and Category for finishers are saved as a csv, the script below works to generate the following plots.






Aggregated stats for the race are here. The beeswarm plot nicely shows the distribution of runners times and paces per category. There’s a bimodality to some of the age groups which is interesting. You can see from the average times that people get slower as they get older, as expected.

There was a roughly 2:1 split of M:F runners with a similar proportion in all categories. The ratio is similar for DNSers. The winning times were Andrew Savery of Leamington C A & C in MV35 with 01:12:51 and Polly Keen of Nuneaton Harriers in F sen with 01:23:46.

Congrats to everyone who ran and thanks to the organisers and all the supporters out on the course.

require(ggplot2)
require(ggbeeswarm)
file_name <- file.choose()
df1 <- read.csv(file_name, header = TRUE, stringsAsFactors = FALSE)
# aggregate M and F to a new category called Gender
df1$Gender <- ifelse(startsWith(df1$Category,"F"),"F","M")
# format Date column to POSIXct
df1$Time <- as.POSIXct(strptime(df1$Time, format = "%H:%M:%S"))
orig_var <- as.POSIXct("00:00:00", format = "%H:%M:%S")
p1 <- ggplot( data = df1, aes(x = Category,y = Time, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%H:%M:%S", limits = c(orig_var,NA))
p1
# instead of finishing time, let's look at pace (min/km)
df1$Pace <- as.numeric(difftime(df1$Time, orig_var) / 21.1) * 3600
df1$Pace <- as.POSIXct(df1$Pace, origin = orig_var, format = "%H:%M:%S")
p2 <- ggplot( data = df1, aes(x = Category,y = Pace, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%M:%S", limits = c(orig_var,NA))
p2
# calculate speeds rather than pace
df1$Speed <- 21.1 / as.numeric(difftime(df1$Time, orig_var))
p3 <- ggplot( data = df1, aes(x = Category, y = Speed, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  ylim(0,NA) + ylab("Speed (km/h)")
p3
# now make the same plots but by Gender rather than Category
p4 <- ggplot( data = df1, aes(x = Gender,y = Time, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%H:%M:%S", limits = c(orig_var,NA))
p4
p5 <- ggplot( data = df1, aes(x = Gender,y = Pace, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%M:%S", limits = c(orig_var,NA))
p5
p6 <- ggplot( data = df1, aes(x = Gender, y = Speed, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  ylim(0,NA) + ylab("Speed (km/h)")
p6
ggsave("raceTimeByCat.png", plot = p1)
ggsave("racePaceByCat.png", plot = p2)
ggsave("raceSpeedByCat.png", plot = p3)
ggsave("raceTimeByGen.png", plot = p4)
ggsave("racePaceByGen.png", plot = p5)
ggsave("raceSpeedByGen.png", plot = p6)


Edit 2018-09-12T18:52:43Z I wasn’t happy with the plots and added a few more lines to look at gender as well as category. And show speed as well as pace and finishing time.

The post title is taken from “Pledging My Time” a track from Blonde on Blonde by Bob Dylan

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)