Omega Results and the 2021 Olympic Trials

[This article was first published on Welcome to Swimming + Data Science on Swimming + Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Omega Timing is the official timekeeper for the Olympic Games, including US Olympic Trails. They don’t do very many other events, which is why SwimmeR hasn’t supported Omega-style results. Until now that is. Omega results can now be read into R with versions of SwimmeR >= 0.10.2, presently available as developmental versions from Github. We’ll read some Omega results in, and then do a quick set of tests about athlete reaction times.

devtools::install_github("gpilgrim2670/SwimmeR", build_vignettes = TRUE)

The 2020 US Trials are being held in 2021, in two parts. Wave I was held June 4th to 7th, and Wave II is currently being held June 13th – 20th. Omega has published the entire Wave I results here, but to avoid any potential broken links down the road I’m also hosting them on github here.

Let’s get set up and take a look.

library(SwimmeR)
library(dplyr)
library(stringr)
library(ggplot2)
library(flextable)

flextable_style <- function(x) {
  x %>%
    flextable() %>%
    bold(part = "header") %>% # bolds header
    bg(bg = "#D3D3D3", part = "header") %>%  # puts gray background behind the header row
    autofit()
}



US Trials Wave I – Getting Omega Results

The process of reading in Omega results with SwimmeR is exactly the same as reading in Hy-Tek or S.A.M.M.S.. Here’s the entire set of results from Wave I.

file <-
  "https://github.com/gpilgrim2670/Pilgrim_Data/raw/master/Omega/Omega_OT_Wave1_FullResults_2021.pdf"

Wave_I <- file %>%
  read_results() %>%
  swim_parse(splits = TRUE)

Here’s the top three finishers in the Women’s 100 Fly Final. The usual information is present – Place, Name, Team Finals_Time (Omega results don’t include prelims times…), various Splits columns. Also present is a Reaction_Time column, that will be the focus of a little demonstration later on.

Wave_I %>%
  filter(Event == "6 JUN 2021 - 7:37 PM Women's 100m Butterfly Final") %>%
  head(3) %>%
  select(where( ~ !all(is.na(.)))) %>% # remove splits columns that aren't relevant to this race (Split_150 etc.)
  select(-DQ,
         -Exhibition,
         "Reaction" = "Reaction_Time",
         "Finals" = "Finals_Time") %>%
  flextable_style()



US Trials Wave II

Wave II of the US trials is where the actual Olympic Team is being selected. It’s still underway as of this writing, so there’s not a single document containing all results available. Individual result documents for each event are being posted however, as the events are completed. Here’s the Women’s 100 Breaststroke final, featuring Lilly King.

file <-
  "https://github.com/gpilgrim2670/Pilgrim_Data/raw/master/Omega/Omega_OT_Wave2_W100Br_Finals_2021.pdf"

W100Br <- file %>%
  read_results() %>%
  swim_parse(splits = TRUE)

W100Br %>%
  select(-DQ,
         -Exhibition,
         "Reaction" = "Reaction_Time", 
         "Finals" = "Finals_Time") %>%
  flextable_style()



Australian Trials

Also underway are the Australian Trials. Similarly to the US Trials they can be read into R using SwimmeR versions >= 0.10.2. For the very curious, these are Hy-Tek results, not Omega. We at Swimming + Data Science have scrapped entire Hy-Tek live results pages before and the same general principles can be applied the collect all Australian Trials results. Here’s just the Men’s 100 Fly Final.

file <-
  "http://liveresults.swimming.org.au/SAL/2021TRIALS/210612F015.htm"

M100Bk <- file %>%
  read_results() %>%
  swim_parse(splits = TRUE)

M100Bk %>%
  select(-DQ,
         -Exhibition,
         -Points,
         "Prelims" = "Prelims_Time",
         "Finals" = "Finals_Time") %>%
  flextable_style()



US Trials Wave I Reaction Time Demo

Let’s see if there’s a difference between the reaction times of sprinters, mid distance swimmers and distance swimmers in the US Trials Wave I results. We’ll define anyone who swims 50 or 100m distances as a sprinter, anyone who swims the 800 or 1500m distances as a distance swimmer, and everyone else as mid-distance.

For this analysis We’ll need the Lane, Name, Reaction_Time and Event columns. The other columns won’t be needed, so I’ll remove them.

We can pull distances out the event names. Note however from the 100 Fly results above that the event names contain more information than we’re perhaps used to seeing. Let’s clean that up.

Wave_I_Clean <- Wave_I %>%
  select(Lane, Name, Team, Reaction_Time, Event) %>% # select only columns of interest
  mutate(Event = str_remove(Event, ".*(?=(Men)|(Women))")) %>% # remove everything in event names before Men or Women
  mutate(Reaction_Time = as.numeric(Reaction_Time)) # change type of Reaction_Time column

Now we can classify swimmers by type.

Wave_I_Clean <- Wave_I_Clean %>%
  group_by(Name) %>% # determining type by athlete
  mutate(Type = case_when(
    # encode athlete types based on events swam
    any(str_detect(Event, "(1500m)|(800m)"), na.rm = TRUE) == TRUE ~ "Distance",
    any(str_detect(Event, "(100m)|(50m)"), na.rm = TRUE) == TRUE ~ "Sprint",
    TRUE ~ "Mid"
  )) %>%
  mutate(Type = factor(Type, levels = c("Sprint", "Mid", "Distance"))) # type as ordered factor for ggplot later

Let’s look at the distribution of reaction times by swimmer type.

Wave_I_Clean %>%
  ggplot(aes(x = Type, y = Reaction_Time, fill = Type)) +
  geom_violin() +
  theme_bw() +
  labs(y = "Reaction Time (s)",
       title = "Reaction Times by Swimmer Type")

There is a noticeable shift towards slower reaction times for distance swimmers compared to sprint and mid-distance, but is it significant? We can use an ANOVA test to determine if the values are significantly different to some standard (called a p value).

reaction_anova <- aov(Reaction_Time ~ Type, data = Wave_I_Clean) # calculate anova
reaction_anova_summary <- summary(reaction_anova) # save summary anova object
reaction_anova_summary # view anova results
##               Df Sum Sq Mean Sq F value Pr(>F)    
## Type           2  0.479 0.23930   74.65 <2e-16 ***
## Residuals   1270  4.071 0.00321                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p value is very low, at 2.2336931^{-31}. We can conclude that their are significant differences between the groups to at least a significance value (p value) of 0.001. That means the likelihood of these level of difference between the three groups appearing as the result of random variations in populations that are actually identical is less than 0.1%. The ANOVA test doesn’t tell us which group(s) have the significant differences though. For that we can use a Tukey HSD test.

reaction_Tukey <- TukeyHSD(reaction_anova) # calculate Tukey HSD
reaction_Tukey # view results
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Reaction_Time ~ Type, data = Wave_I_Clean)
## 
## $Type
##                       diff        lwr        upr p adj
## Mid-Sprint      0.02606628 0.01762142 0.03451114     0
## Distance-Sprint 0.07474784 0.05854508 0.09095060     0
## Distance-Mid    0.04868156 0.03158292 0.06578019     0

The adjusted p values are all approximately zero. we can see what they actually are by pulling them out of the reaction_Tukey model object.

reaction_Tukey$Type[,"p adj"] # view actual adjusted p values
##      Mid-Sprint Distance-Sprint    Distance-Mid 
##    1.634137e-12    0.000000e+00    1.058689e-10

All very low, so all the groups have differences significant at the p = 0.001 level. Sprinters really do have faster reaction times than mid-distance, who are in turn faster than distance swimmers.


Reaction Times By Lane

Just for giggles let’s also look by lane. When I was swimming there was always this rumor going around that swimmers in the outside lane nearest the starting device would have an advantage, because the light/sound from the device would reach them before it reached athletes further from the device. It never made much sense, since faster swimmers were deliberately seeded into inner lanes and they usually won. Nowadays each block is equipped with a LED light bar and a sounding device so everything should be equal (if it ever wasn’t).

Wave_I_Clean %>%
  filter(Lane != "0") %>% 
  ggplot(aes(x = Lane, y = Reaction_Time, fill = Lane)) +
  geom_violin() +
  theme_bw() +
  labs(y = "Reaction Time (s)",
       title = "Reaction Times by Lane")


That looks about even to me. Let’s see what the testing has to say.

reaction_anova <- aov(Reaction_Time ~ Lane, data = Wave_I_Clean) # calculate anova
reaction_anova_summary <- summary(reaction_anova) # save summary anova object
reaction_anova_summary # view anova results
##               Df Sum Sq  Mean Sq F value Pr(>F)
## Lane           8  0.025 0.003144   0.878  0.534
## Residuals   1264  4.525 0.003580

Here the p value is 0.5341483, which is larger than any p value we’d care to use. There is no significant difference in reaction time by lane.



In Closing

I hope you’re enjoying the various Olympic Trials meets, even all the more so now that SwimmeR makes it easy to import them into R. Join us next time here at Swimming + Data Science where we’ll take a look at something else swimming-centric.

To leave a comment for the author, please follow the link and comment on their blog: Welcome to Swimming + Data Science on Swimming + Data Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)