**Actuarially (Matt Malin)**, and kindly contributed to R-bloggers)

# 2012 Olympics Swimming – 100m Butterfly Men Finals prediction

*Author: Matt Malin*

Inspired by mages’ blog with predictions for 100m running times, I’ve decided to perform some basic modelling (loess and linear modelling) on previous Olympic results for the 100m Butterfly Men’s medal winning results.

## Code setup

```
library(XML)
library(ggplot2)
swimming_path <- "http://www.databasesports.com/olympics/sport/sportevent.htm?sp=SWI&enum=200"
swimming_data <- readHTMLTable(
readLines(swimming_path),
which = 3,
stringsAsFactors = FALSE)
# due to some potential errors in passing header = TRUE:
names(swimming_data) <- swimming_data[1, ]
swimming_data <- swimming_data[-1, ]
swimming_data[["Result"]] <- as.numeric(swimming_data[["Result"]])
swimming_data[["Year"]] <- as.numeric(swimming_data[["Year"]])
swimming_data <- na.omit(swimming_data)
loess_prediction <- function(
medal_type = "GOLD",
prediction_year = 2012)
{
medal_type <- toupper(medal_type)
swimming_loess <- loess(
Result ~ Year,
subset(swimming_data, Medal == medal_type),
control = loess.control(surface = "direct"))
swimming_prediction <- predict(
swimming_loess,
data.frame(Year = prediction_year),
se = FALSE)
return(swimming_prediction)
}
log_lm_prediction <- function(
medal_type = "GOLD",
prediction_year = 2012)
{
medal_type <- toupper(medal_type)
swimming_log_lm <- lm(
log(Result) ~ Year,
subset(swimming_data, Medal == medal_type))
swimming_prediction <- exp(predict(
swimming_log_lm,
data.frame(Year = prediction_year),
se = FALSE))
return(swimming_prediction)
}
swimming_data <- rbind(
data.frame(
swimming_data[c("Year", "Medal", "Result")],
type = "actual"),
data.frame(
Year = rep(2012, 3),
Medal = c("GOLD", "SILVER", "BRONZE"),
Result = c(
loess_prediction("gold"),
loess_prediction("silver"),
loess_prediction("bronze")),
type = rep("loess_prediction", 3)))
medal_colours <- c(
GOLD = rgb(201, 137, 16, maxColorValue = 255),
SILVER = rgb(168, 168, 168, maxColorValue = 255),
BRONZE = rgb(150, 90, 56, maxColorValue = 255))
swimming_plot <- ggplot(
swimming_data,
aes(
x = Year,
y = Result,
colour = Medal,
group = Medal)) +
scale_x_continuous(limits = c(1968, 2012)) +
geom_point() +
stat_smooth(
aes(fill = Medal),
alpha = 0.25,
data = subset(swimming_data, type = "actual"),
fullrange = FALSE,
method = loess)
swimming_plot <- swimming_plot +
scale_fill_manual(values = medal_colours) +
scale_colour_manual(values = medal_colours) + theme_bw()
```

## Predictions

I now use the functions loess_prediction and log_lm_prediction to estimate the times for the medal winning times.

### Loess predictions

The gold prediction for 2012 is `49.7`

seconds, for silver is `49.5`

seconds, and for bronze is `50.2`

seconds.

### Linear modelling (of log results)

I’ve shown the code here for the calls to the linear modelling approach:

```
swimming_log_lm_gold <- log_lm_prediction("gold")
swimming_log_lm_silver <- log_lm_prediction("silver")
swimming_log_lm_bronze <- log_lm_prediction("bronze")
```

This gives the following times as predictions:

`swimming_log_lm_gold`

```
## 1
## 50.23
```

`swimming_log_lm_silver`

```
## 1
## 50.09
```

`swimming_log_lm_bronze`

```
## 1
## 50.46
```

### Loess prediction plot

The following is a plot of actual and predicted times, along with loess error setting as defaults from geom_smooth:

## Notes

Note that because of the small difference between the silver and gold medal results at the 2008 olympics, the trend of improvement in silver exceeds that in the gold, so the prediction is that the silver time will be faster than the gold!

Also note that this takes into account no information about performance of athletes involved or changes in rules, such as being unable to use the swimsuits that were present in the last Olympics and largely attributed to improving performance, purely modelling from a few data points as an interesting exercise!

## Final Summary

To summarise, the final predicted results using these methods are:

```
library(pander)
predictions <- data.frame(
Medal = c("Gold", "Silver", "Bronze"),
Loess_prediction = c(
loess_prediction("gold"),
loess_prediction("silver"),
loess_prediction("bronze")),
Log_Linear_prediction = c(
log_lm_prediction("gold"),
log_lm_prediction("silver"),
log_lm_prediction("bronze")))
pandoc.table(predictions)
```

Medal | Loess_prediction | Log_Linear_prediction |
---|---|---|

Gold | 49.69 | 50.23 |

Silver | 49.52 | 50.09 |

Bronze | 50.20 | 50.46 |

Obviously the predictions here are very crudely performed, especially given that it produces a faster time for a silver medal than for gold, but it’ll still be interesting to see what actually happens, and if it’ll be Michael Phelps yet again!

**leave a comment**for the author, please follow the link and comment on their blog:

**Actuarially (Matt Malin)**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...