2012 Olympics Swimming – 100m Butterfly Men Finals prediction

[This article was first published on Actuarially (Matt Malin), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

2012 Olympics Swimming – 100m Butterfly Men Finals prediction

Author: Matt Malin

Inspired by mages’ blog with predictions for 100m running times, I’ve decided to perform some basic modelling (loess and linear modelling) on previous Olympic results for the 100m Butterfly Men’s medal winning results.

Code setup

library(XML)
library(ggplot2)

swimming_path <- "http://www.databasesports.com/olympics/sport/sportevent.htm?sp=SWI&enum=200"

swimming_data <- readHTMLTable(
  readLines(swimming_path), 
  which = 3, 
  stringsAsFactors = FALSE)

# due to some potential errors in passing header = TRUE:
names(swimming_data) <- swimming_data[1, ]
swimming_data <- swimming_data[-1, ]

swimming_data[["Result"]] <- as.numeric(swimming_data[["Result"]])
swimming_data[["Year"]]   <- as.numeric(swimming_data[["Year"]])
swimming_data             <- na.omit(swimming_data)

loess_prediction <- function(
  medal_type = "GOLD", 
  prediction_year = 2012) 
{
  medal_type <- toupper(medal_type)
 
 swimming_loess <- loess(
    Result ~ Year, 
    subset(swimming_data, Medal == medal_type),
    control = loess.control(surface = "direct"))
  
  swimming_prediction <- predict(
    swimming_loess, 
    data.frame(Year = prediction_year), 
    se = FALSE)

  return(swimming_prediction)
}

log_lm_prediction <- function(
  medal_type = "GOLD", 
  prediction_year = 2012) 
{
  medal_type <- toupper(medal_type)
  swimming_log_lm <- lm(
    log(Result) ~ Year, 
    subset(swimming_data, Medal == medal_type))
  
  swimming_prediction <- exp(predict(
    swimming_log_lm, 
    data.frame(Year = prediction_year), 
    se = FALSE))

  return(swimming_prediction)
}

swimming_data <- rbind(
  data.frame(
    swimming_data[c("Year", "Medal", "Result")], 
    type = "actual"),
  data.frame(
    Year = rep(2012, 3),
    Medal = c("GOLD", "SILVER", "BRONZE"),
    Result = c(
      loess_prediction("gold"), 
      loess_prediction("silver"),
      loess_prediction("bronze")),
    type = rep("loess_prediction", 3)))

medal_colours <- c(
  GOLD   = rgb(201, 137, 16, maxColorValue = 255),
  SILVER = rgb(168, 168, 168, maxColorValue = 255),
  BRONZE = rgb(150, 90, 56, maxColorValue = 255))
        
swimming_plot <- ggplot(
  swimming_data,
  aes(
    x = Year, 
    y = Result, 
    colour = Medal, 
    group = Medal)) + 
  scale_x_continuous(limits = c(1968, 2012)) +
  geom_point() + 
  stat_smooth(
    aes(fill = Medal), 
    alpha = 0.25, 
    data = subset(swimming_data, type = "actual"), 
    fullrange = FALSE, 
    method = loess)
    
swimming_plot <- swimming_plot + 
  scale_fill_manual(values = medal_colours) + 
  scale_colour_manual(values = medal_colours) + theme_bw()

Predictions

I now use the functions loess_prediction and log_lm_prediction to estimate the times for the medal winning times.

Loess predictions

The gold prediction for 2012 is 49.7 seconds, for silver is 49.5 seconds, and for bronze is 50.2 seconds.

Linear modelling (of log results)

I’ve shown the code here for the calls to the linear modelling approach:

swimming_log_lm_gold   <- log_lm_prediction("gold")
swimming_log_lm_silver <- log_lm_prediction("silver")
swimming_log_lm_bronze <- log_lm_prediction("bronze")

This gives the following times as predictions:

swimming_log_lm_gold
##     1 
## 50.23 
swimming_log_lm_silver
##     1 
## 50.09 
swimming_log_lm_bronze
##     1 
## 50.46 

Loess prediction plot

The following is a plot of actual and predicted times, along with loess error setting as defaults from geom_smooth:

plot of chunk plot

Notes

Note that because of the small difference between the silver and gold medal results at the 2008 olympics, the trend of improvement in silver exceeds that in the gold, so the prediction is that the silver time will be faster than the gold!

Also note that this takes into account no information about performance of athletes involved or changes in rules, such as being unable to use the swimsuits that were present in the last Olympics and largely attributed to improving performance, purely modelling from a few data points as an interesting exercise!

Final Summary

To summarise, the final predicted results using these methods are:

library(pander)
predictions <- data.frame(
  Medal = c("Gold", "Silver", "Bronze"),
  Loess_prediction = c(
    loess_prediction("gold"),
    loess_prediction("silver"),
    loess_prediction("bronze")),
  Log_Linear_prediction = c(
    log_lm_prediction("gold"),
    log_lm_prediction("silver"),
    log_lm_prediction("bronze")))
pandoc.table(predictions)
Medal Loess_prediction Log_Linear_prediction
Gold 49.69 50.23
Silver 49.52 50.09
Bronze 50.20 50.46

Obviously the predictions here are very crudely performed, especially given that it produces a faster time for a silver medal than for gold, but it’ll still be interesting to see what actually happens, and if it’ll be Michael Phelps yet again!

To leave a comment for the author, please follow the link and comment on their blog: Actuarially (Matt Malin).

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)