Pledging My Time III

Posted on September 8, 2018 by quantixed in R bloggers | 0 Comments

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve previously crunched times for local Half and Full Marathons here on quantixed. Last weekend was the Kenilworth Half Marathon (2018) over a new course. I thought I’d have a look at the distributions of times and paces of the runners. The times are available here. If the Time and Category for finishers are saved as a csv, the script below works to generate the following plots.

Aggregated stats for the race are here. The beeswarm plot nicely shows the distribution of runners times and paces per category. There’s a bimodality to some of the age groups which is interesting. You can see from the average times that people get slower as they get older, as expected.

There was a roughly 2:1 split of M:F runners with a similar proportion in all categories. The ratio is similar for DNSers. The winning times were Andrew Savery of Leamington C A & C in MV35 with 01:12:51 and Polly Keen of Nuneaton Harriers in F sen with 01:23:46.

Congrats to everyone who ran and thanks to the organisers and all the supporters out on the course.


require(ggplot2)
require(ggbeeswarm)
file_name <- file.choose()
df1 <- read.csv(file_name, header = TRUE, stringsAsFactors = FALSE)
# aggregate M and F to a new category called Gender
df1$Gender <- ifelse(startsWith(df1$Category,"F"),"F","M")
# format Date column to POSIXct
df1$Time <- as.POSIXct(strptime(df1$Time, format = "%H:%M:%S"))
orig_var <- as.POSIXct("00:00:00", format = "%H:%M:%S")
p1 <- ggplot( data = df1, aes(x = Category,y = Time, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%H:%M:%S", limits = c(orig_var,NA))
p1
# instead of finishing time, let's look at pace (min/km)
df1$Pace <- as.numeric(difftime(df1$Time, orig_var) / 21.1) * 3600
df1$Pace <- as.POSIXct(df1$Pace, origin = orig_var, format = "%H:%M:%S")
p2 <- ggplot( data = df1, aes(x = Category,y = Pace, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%M:%S", limits = c(orig_var,NA))
p2
# calculate speeds rather than pace
df1$Speed <- 21.1 / as.numeric(difftime(df1$Time, orig_var))
p3 <- ggplot( data = df1, aes(x = Category, y = Speed, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  ylim(0,NA) + ylab("Speed (km/h)")
p3
# now make the same plots but by Gender rather than Category
p4 <- ggplot( data = df1, aes(x = Gender,y = Time, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%H:%M:%S", limits = c(orig_var,NA))
p4
p5 <- ggplot( data = df1, aes(x = Gender,y = Pace, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  scale_y_datetime(date_labels = "%M:%S", limits = c(orig_var,NA))
p5
p6 <- ggplot( data = df1, aes(x = Gender, y = Speed, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun.y = mean, geom = "point", size=2, aes(group = 1)) +
  ylim(0,NA) + ylab("Speed (km/h)")
p6
ggsave("raceTimeByCat.png", plot = p1)
ggsave("racePaceByCat.png", plot = p2)
ggsave("raceSpeedByCat.png", plot = p3)
ggsave("raceTimeByGen.png", plot = p4)
ggsave("racePaceByGen.png", plot = p5)
ggsave("raceSpeedByGen.png", plot = p6)

Edit 2018-09-12T18:52:43Z I wasn’t happy with the plots and added a few more lines to look at gender as well as category. And show speed as well as pace and finishing time.

—

The post title is taken from “Pledging My Time” a track from Blonde on Blonde by Bob Dylan

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Pledging My Time III

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)