Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

On Sunday the Tokyo Olympics men sprint 100m final will take place. Francesc Montané reminded me in his analysis that 9 years ago I used a simple regression model to predict the winning time for the 100m men sprint final of the 2012 Olympics in London. My model predicted a winning time of 9.68s, yet Usain Bolt finished in 9.63s. For this Sunday my prediction is 9.72s, with a 50% credible interval of [9.61s, 9.85s].

Since the 2012 Olympics things changed, Usain Bolt has retired and new shoes with advance spike technology have created a little bit of a controversy. I don’t think a log-linear regression is still sensible. We may see faster times, but there will be a limit. Hence, an S-shape curve might be a good starting point. There are many to choose from. I picked the following form:

\begin{aligned} f(x) = & L + 1 – \frac{x}{\left(1 + |x|^{k}\right)^{1/k}} \\ \end{aligned}

With $$L=9$$ and $$k=0.9$$ it looks like this:

## Data

Here is the historical data again:

library(data.table)
text="
Year, Event, Athlete, Medal, Country, Time
1896, 100m Men,        Tom Burke,  GOLD,     USA,  12.00
1900, 100m Men,     Frank Jarvis,  GOLD,     USA,  11.00
1904, 100m Men,      Archie Hahn,  GOLD,     USA,  11.00
1906, 100m Men,      Archie Hahn,  GOLD,     USA,  11.20
1908, 100m Men,    Reggie Walker,  GOLD,     SAF,  10.80
1912, 100m Men,      Ralph Craig,  GOLD,     USA,  10.80
1920, 100m Men,  Charles Paddock,  GOLD,     USA,  10.80
1924, 100m Men,  Harold Abrahams,  GOLD,     GBR,  10.60
1928, 100m Men,   Percy Williams,  GOLD,     CAN,  10.80
1932, 100m Men,      Eddie Tolan,  GOLD,     USA,  10.30
1936, 100m Men,      Jesse Owens,  GOLD,     USA,  10.30
1948, 100m Men, Harrison Dillard,  GOLD,     USA,  10.30
1952, 100m Men,   Lindy Remigino,  GOLD,     USA,  10.40
1956, 100m Men,     Bobby Morrow,  GOLD,     USA,  10.50
1960, 100m Men,       Armin Hary,  GOLD,     GER,  10.20
1964, 100m Men,        Bob Hayes,  GOLD,     USA,  10.00
1968, 100m Men,        Jim Hines,  GOLD,     USA,   9.95
1972, 100m Men,    Valery Borzov,  GOLD,     URS,  10.14
1976, 100m Men,  Hasely Crawford,  GOLD,     TRI,  10.06
1980, 100m Men,      Allan Wells,  GOLD,     GBR,  10.25
1984, 100m Men,       Carl Lewis,  GOLD,     USA,   9.99
1988, 100m Men,       Carl Lewis,  GOLD,     USA,   9.92
1992, 100m Men, Linford Christie,  GOLD,     GBR,   9.96
1996, 100m Men,   Donovan Bailey,  GOLD,     CAN,   9.84
2000, 100m Men,   Maurice Greene,  GOLD,     USA,   9.87
2004, 100m Men,    Justin Gatlin,  GOLD,     USA,   9.85
2008, 100m Men,       Usain Bolt,  GOLD,     JAM,   9.69
2012, 100m Men,       Usain Bolt,  GOLD,     JAM,   9.63
2016, 100m Men,       Usain Bolt,  GOLD,     JAM,   9.81
")
library(ggplot2)
ggplot(golddata, aes(x=Year, y=Time)) +
geom_line() + geom_point() +
labs(title="Winning times of Olympic gold medalist 100m sprint men")

## Model

First I prepare the data, I remove the first measurement from 1896 that looks too much like an outlier. To fit the non-linear function it is helpful to centre and scale the ‘Year’ metric. However, instead of doing this right away, I will include variables based on the output of scale in my model.

golddata1900 <- golddata[Year>=1900]
attr(scale(golddata1900$Year), "scaled:center") ## [1] 1958.786 attr(scale(golddata1900$Year), "scaled:scale")
## [1] 36.69833

To fit the non-linear function I use the brms package. For my Bayesian regression model I assume a Gaussian data distribution and Normal priors for the parameters.

library(brms)
library(parallel)
nCores <- detectCores()
options(mc.cores = nCores)
mdl <- brm(
bf(Time ~ L + 1 - ((Year-C)/S)/(1+fabs((Year-C)/S)^k)^(1/k),
C ~ 1, S ~ 1, L ~ 1, k ~ 1, nl = TRUE),
data = golddata1900, family = gaussian(),
prior = c(
prior(normal(1959, 5), nlpar = "C"),
prior(normal(37, 1), nlpar = "S"),
prior(normal(9, 0.2), nlpar = "L", lb=0),
prior(normal(1, 0.2), nlpar = "k", lb=0)
),
seed = 1234, iter = 2000,
chains = 4, cores = nCores,
)
mdl
##  Family: gaussian
##   Links: mu = identity; sigma = identity
## Formula: Time ~ L + 1 - ((Year - C)/S)/(1 + fabs((Year - C)/S)^k)^(1/k)
##          C ~ 1
##          S ~ 1
##          L ~ 1
##          k ~ 1
##    Data: golddata1900 (Number of observations: 28)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
##
## Population-Level Effects:
##             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## C_Intercept  1956.62      4.17  1947.95  1964.53 1.00     2403     2066
## S_Intercept    37.12      1.00    35.18    39.10 1.00     2770     2547
## L_Intercept     9.31      0.05     9.21     9.40 1.00     2483     2388
## k_Intercept     0.90      0.09     0.74     1.10 1.00     3803     2643
##
## Family Specific Parameters:
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.18      0.03     0.13     0.24 1.00     3269     2478
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

The model runs without problems and the estimated parameters are not too far away from my prior assumptions. Nice!

The parameter $$L$$ represents the fastest a human could possibly run. The 95% credible interval is between 9.21s and 9.40s.

plot(brms::conditional_effects(mdl, method = "predict"), points=TRUE)

## Prediction

The model looks plausible, so what are the predictions for this Sunday, and the Olympics in Paris and Los Angeles?

p <- brms::posterior_predict(
mdl, newdata=data.frame(Year=c(2021, 2024, 2028)))
colnames(p) <- c(2021, 2024, 2028)
rbind(
mean=apply(p, 2, mean),
apply(p, 2, quantile, probs=c(0.25, 0.5, 0.75))
)
##          2021     2024     2028
## mean 9.724736 9.707647 9.695373
## 25%  9.606649 9.585469 9.575641
## 50%  9.723891 9.710383 9.697171
## 75%  9.845782 9.828603 9.819280

A winning time of close to 9.72s for this Sunday might be achievable. But the fastest time this year of the American Trayvon Bromell, who is favourite to take Bolt’s 100m title in Tokyo, was 9.77s.

Interestingly, Francesc’s ETS model without Usain Bolt’s data is predicting a winning time of 9.73s.

## Session Info

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Big Sur 11.5.1
##
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
## [1] ggplot2_3.3.5     data.table_1.14.0
##
## loaded via a namespace (and not attached):
##   [1] minqa_1.2.4          colorspace_2.0-2     ellipsis_0.3.2
##   [4] ggridges_0.5.3       brms_2.15.0          rsconnect_0.8.18
##   [7] markdown_1.1         base64enc_0.1-3      rstudioapi_0.13
##  [10] farver_2.1.0         rstan_2.21.2         DT_0.18
##  [13] fansi_0.5.0          mvtnorm_1.1-2        codetools_0.2-18
##  [16] bridgesampling_1.1-2 splines_4.1.0        knitr_1.33
##  [19] shinythemes_1.2.0    bayesplot_1.8.1      projpred_2.0.2
##  [22] jsonlite_1.7.2       nloptr_1.2.2.2       shiny_1.6.0
##  [25] compiler_4.1.0       backports_1.2.1      assertthat_0.2.1
##  [28] Matrix_1.3-4         fastmap_1.1.0        cli_3.0.1
##  [31] later_1.2.0          htmltools_0.5.1.1    prettyunits_1.1.1
##  [34] tools_4.1.0          igraph_1.2.6         coda_0.19-4
##  [37] gtable_0.3.0         glue_1.4.2           reshape2_1.4.4
##  [40] dplyr_1.0.7          V8_3.4.2             Rcpp_1.0.7
##  [43] jquerylib_0.1.4      vctrs_0.3.8          nlme_3.1-152
##  [46] blogdown_1.4         crosstalk_1.1.1      xfun_0.24
##  [49] stringr_1.4.0        ps_1.6.0             lme4_1.1-27.1
##  [52] mime_0.11            miniUI_0.1.1.1       lifecycle_1.0.0
##  [55] gtools_3.9.2         MASS_7.3-54          zoo_1.8-9
##  [58] scales_1.1.1         colourpicker_1.1.0   promises_1.2.0.1
##  [61] Brobdingnag_1.2-6    parallel_4.1.0       inline_0.3.19
##  [64] shinystan_2.5.0      gamm4_0.2-6          yaml_2.2.1
##  [67] curl_4.3.2           gridExtra_2.3        loo_2.4.1
##  [73] highr_0.9            dygraphs_1.1.1.6     boot_1.3-28
##  [76] pkgbuild_1.2.0       rlang_0.4.11         pkgconfig_2.0.3
##  [79] matrixStats_0.59.0   evaluate_0.14        lattice_0.20-44
##  [82] purrr_0.3.4          rstantools_2.1.1     htmlwidgets_1.5.3
##  [85] labeling_0.4.2       tidyselect_1.1.1     processx_3.5.2
##  [88] plyr_1.8.6           magrittr_2.0.1       bookdown_0.22
##  [91] R6_2.5.0             generics_0.1.0       DBI_1.1.1
##  [94] pillar_1.6.1         withr_2.4.2          mgcv_1.8-36
##  [97] xts_0.12.1           abind_1.4-5          tibble_3.1.3
## [100] crayon_1.4.1         utf8_1.2.2           rmarkdown_2.9
## [103] grid_4.1.0           callr_3.7.0          threejs_0.3.3
## [106] digest_0.6.27        xtable_1.8-4         httpuv_1.6.1
## [109] RcppParallel_5.1.4   stats4_4.1.0         munsell_0.5.0
## [112] bslib_0.2.5.1        shinyjs_2.0.0