High School Swimming State-Off Tournament Texas (2) vs. Florida (3)

Posted on September 3, 2020 by Swimming + Data Science in R bloggers | 0 Comments

[This article was first published on Swimming + Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to round two of the State-Off. Today we have Texas (2) taking on Florida (3) for the right to compete in the State-Off championships! Don’t forget to update your version of SwimmeR to 0.4.1, because we’ll be using some newly released functions. We’ll also take a look at how to plot swimming times while maintaining their format.

library(SwimmeR)
library(dplyr)
library(stringr)
library(flextable)
library(ggplot2)

Since I like to use the same style flextable to display a lot of results, rather than retyping the associated flextable calls let’s just define a function now, which we can reuse every time.

flextable_style <- function(x) {
  x %>%
    flextable() %>%
    bold(part = "header") %>% # bolds header
    bg(bg = "#D3D3D3", part = "header") %>% # puts gray background behind the header row
    autofit()
}

Getting Results

As I mentioned in the SwimmeR 0.4.1 post last week these posts are no longer going to focus on using SwimmeR to read in results. That’s been covered extensively in previous State-Off posts. Instead I’ve already read in the results for each state and I’m hosting them on github.

We’ll just pull results for Texas and Florida, and then stick them together with bind_rows.

Texas_Link <-
  "https://raw.githubusercontent.com/gpilgrim2670/Pilgrim_Data/master/TX_States_2020.csv"
Texas_Results <- read.csv(url(Texas_Link)) %>% 
  mutate(State = "TX",
         Grade = as.character(Grade))
Florida_Link <-
  "https://raw.githubusercontent.com/gpilgrim2670/Pilgrim_Data/master/FL_States_2020.csv"
Florida_Results <- read.csv(url(Florida_Link)) %>% 
  mutate(State = "FL")
Results <- Texas_Results %>%
  bind_rows(Florida_Results) %>%
  mutate(Gender = case_when(
    str_detect(Event, "Girls") == TRUE ~ "Girls",
    str_detect(Event, "Boys") == TRUE ~ "Boys"
  ))

Scoring the Meet

Here’s one of the new functions for SwimmeR 0.4.1. If you followed along with previous State-Off posts then you remember we developed the results_score function together during the first round. Now we get to use it in all of its released glory. results_score takes several arguments, results, which are the results to score, events, which are the events of interest, meet_type, which is either "timed_finals" or "prelims_finals". The last three, lanes, scoring_heats, and point_values are fairly self-explanatory, as the number of lanes, scoring heats, and the point values for each place respectively.

For the State_Off we’ve been using a "timed_finals" meet type, with 8 lanes, 2 scoring heats and point values per NFHS.

Results_Final <- results_score(
  results = Results,
  events = unique(Results$Event),
  meet_type = "timed_finals",
  lanes = 8,
  scoring_heats = 2,
  point_values = c(20, 17, 16, 15, 14, 13, 12, 11, 9, 7, 6, 5, 4, 3, 2, 1)
)

Scores <- Results_Final %>%
  group_by(State, Gender) %>%
  summarise(Score = sum(Points))

Scores %>%
  arrange(Gender, desc(Score)) %>%
  ungroup() %>%
  flextable_style()

State	Gender	Score
TX	Boys	1395.5
FL	Boys	929.5
FL	Girls	1261.0
TX	Girls	1064.0

We have our first split result of of the State-Off, with the Texas boys and Florida girls each winning. This is a combined affair though, so let’s see which state will advance…

Scores %>%
  group_by(State) %>%
  summarise(Score = sum(Score)) %>%
  arrange(desc(Score)) %>%
  ungroup() %>%
  flextable_style()

State	Score
TX	2459.5
FL	2190.5

And it’s Texas, living up to it’s higher seed!

Swimmers of the Meet

Swimmer of the Meet criteria is the same as it’s been for the entire State-Off. First we’ll look for athletes who won two events, thereby scoring a the maximum possible forty points. We’ll also grab the All-American cuts to use as a tiebreaker, in case multiple athletes win two events. It’s possible this week we’ll have our first multiple Swimmer of the Meet winner – the suspense!

Cuts_Link <-
  "https://raw.githubusercontent.com/gpilgrim2670/Pilgrim_Data/master/State_Cuts.csv"
Cuts <- read.csv(url(Cuts_Link))

Cuts <- Cuts %>% # clean up Cuts
  filter(Stroke %!in% c("MR", "FR", "11 Dives")) %>% # %!in% is now included in SwimmeR
  rename(Gender = Sex) %>%
  mutate(
    Event = case_when((Distance == 200 & #match events
                         Stroke == 'Free') ~ "200 Yard Freestyle",
                      (Distance == 200 &
                         Stroke == 'IM') ~ "200 Yard IM",
                      (Distance == 50 &
                         Stroke == 'Free') ~ "50 Yard Freestyle",
                      (Distance == 100 &
                         Stroke == 'Fly') ~ "100 Yard Butterfly",
                      (Distance == 100 &
                         Stroke == 'Free') ~ "100 Yard Freestyle",
                      (Distance == 500 &
                         Stroke == 'Free') ~ "500 Yard Freestyle",
                      (Distance == 100 &
                         Stroke == 'Back') ~ "100 Yard Backstroke",
                      (Distance == 100 &
                         Stroke == 'Breast') ~ "100 Yard Breaststroke",
                      TRUE ~ paste(Distance, "Yard", Stroke, sep = " ")
    ),
    
    Event = case_when(
      Gender == "M" ~ paste("Boys", Event, sep = " "),
      Gender == "F" ~ paste("Girls", Event, sep = " ")
    )
  )

Ind_Swimming_Results <- Results_Final %>%
  filter(str_detect(Event, "Diving|Relay") == FALSE) %>% # join Ind_Swimming_Results and Cuts
  left_join(Cuts %>% filter((Gender == "M" &
                               Year == 2020) |
                              (Gender == "F" &
                                 Year == 2019)) %>%
              select(AAC_Cut, AA_Cut, Event),
            by = 'Event')

Swimmer_Of_Meet <- Ind_Swimming_Results %>%
  mutate(
    AA_Diff = (Finals_Time_sec - sec_format(AA_Cut)) / sec_format(AA_Cut),
    Name = str_to_title(Name)
  ) %>%
  group_by(Name) %>%
  filter(n() == 2) %>% # get swimmers that competed in two events
  summarise(
    Avg_Place = sum(Place) / 2,
    AA_Diff_Avg = round(mean(AA_Diff, na.rm = TRUE), 3),
    Gender = unique(Gender),
    State = unique(State)
  ) %>%
  arrange(Avg_Place, AA_Diff_Avg) %>%
  group_split(Gender) # split out a dataframe for boys (1) and girls (2)

Boys

Swimmer_Of_Meet[[1]] %>%
  slice_head(n = 5) %>%
  select(-Gender) %>%
  ungroup() %>%
  flextable_style()

Name	Avg_Place	AA_Diff_Avg	State
Zuchowski, Joshua	1.5	-0.025	FL
Vincent Ribeiro	1.5	-0.023	TX
Matthew Tannenb	1.5	-0.019	TX
David Oderinde	2.0	-0.015	TX
Dalton Lowe	2.5	-0.016	TX

Joshua Zuchowski is the boys swimmer of the meet. He’s a new face for this particular award – in Florida’s first round meet he finished third behind two guys from Illinois. Also no boy won two events.

Results_Final %>%
  filter(Name == "Zuchowski, Joshua") %>%
  select(Place, Name, School, Finals_Time, Event) %>%
  arrange(desc(Event)) %>%
  ungroup() %>%
  flextable_style()

Place	Name	School	Finals_Time	Event
2	Zuchowski, Joshua	King’s Academy	1:47.44	Boys 200 Yard IM
1	Zuchowski, Joshua	King’s Academy	47.85	Boys 100 Yard Backstroke

Girls

Swimmer_Of_Meet[[2]] %>%
  slice_head(n = 5) %>%
  select(-Gender) %>%
  ungroup() %>%
  flextable() %>%
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>%
  autofit()

Name	Avg_Place	AA_Diff_Avg	State
Lillie Nordmann	1.0	-0.046	TX
Weyant, Emma	1.0	-0.033	FL
Cronk, Micayla	1.5	-0.040	FL
Emma Sticklen	1.5	-0.032	TX
Kit Kat Zenick	1.5	-0.029	TX

Lillie Nordmann continues her winning ways, being named girls swimmer of the meet for the second meet in a row. While she hasn’t written in to answer my questions about swimming outdoors in cold weather she did tell SwimSwam that she’s postponing her collegiate career because of the ongoing pandemic. I don’t blame her. As these results attest, she’s a champ, and frankly this way she’ll be able to focus more on trying to make the Olympic team. Also, it’s warmer in Texas than it is in Palo Alto (or nearby San Jose). Just saying. Actually I’m not just saying – I have data on the temperatures in San Jose and Houston, from the National Weather Service. Let me show you what I’m talking about, in both Fahrenheit and Celsius (the State-Off has a surprisingly international audience).

San_Jose_Link <- "https://raw.githubusercontent.com/gpilgrim2670/Pilgrim_Data/master/San_Jose_Temps.csv"
San_Jose <- read.csv(url(San_Jose_Link)) %>% 
  mutate(City = "San Jose")
Houston_Link <- "https://raw.githubusercontent.com/gpilgrim2670/Pilgrim_Data/master/Houston_Temps.csv"
Houston <- read.csv(url(Houston_Link)) %>% 
  mutate(City = "Houston")

Temps <- bind_rows(San_Jose, Houston)

names(Temps) <- c("Month", "Mean_Max", "Mean_Min", "Mean_Avg", "City")

Temps$Month <- factor(Temps$Month, ordered = TRUE, 
                                levels = unique(Temps$Month))

Temps %>%
  ggplot() +
  geom_line(aes(
    x = Month,
    y = Mean_Avg,
    group = City,
    color = City
  )) +
  scale_y_continuous(sec.axis = sec_axis( ~ (. - 32) * (5/9), name = "Monthly Mean Average Temperature (C)")) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(x = "Month", y = "Monthly Mean Average Temperature (F)")

Those are the monthly average temperatures. Plotting Mean_Min is even colder, and closer to what it feels like in those early mornings. If I was going to swim outdoors I’d much rather do it in Texas versus shivering in NorCal. I am a delicate flower.

Results_Final %>%
  filter(Name == "Lillie Nordmann") %>%
  select(Place, Name, School, Finals_Time, Event) %>%
  arrange(desc(Event)) %>%
  ungroup() %>%
  flextable_style()

Place	Name	School	Finals_Time	Event
1	Lillie Nordmann	COTW	1:43.62	Girls 200 Yard Freestyle
1	Lillie Nordmann	COTW	52.00	Girls 100 Yard Butterfly

The DQ Column

A new feature in SwimmeR 0.4.1 is that results now include a column called DQ. When DQ == 1 a swimmer or relay has been disqualified. Recording DQs is of course important to the overall mission of SwimmeR (making swimming results available in usable formats etc. etc.), but it’s also of personal interest to me as an official.

Results %>%
  group_by(State) %>%
  summarise(DQs = sum(DQ, na.rm = TRUE) / n()) %>%
  flextable_style()

State	DQs
FL	0.01050119
TX	0.02785030

Results %>% 
  mutate(Event = str_remove(Event, "Girls |Boys ")) %>% 
  group_by(Event) %>% 
  summarise(DQs = sum(DQ, na.rm = TRUE)) %>%
  filter(DQs > 0) %>% 
  arrange(desc(DQs)) %>% 
  head(5) %>% 
  flextable_style()

Event	DQs
200 Yard Medley Relay	14
200 Yard Freestyle Relay	10
100 Yard Breaststroke	9
400 Yard Freestyle Relay	8
200 Yard IM	3

Not surprisingly most DQs are seen in relays. False starts can and do happen as teams push for every advantage. The line between a fast start and a false start is exceedingly fine. No one is trying to false start, it just happens sometimes. Breaststroke on the other hand is a disciple where intentional cheating can and sometimes does prosper. Recall Cameron van der Burgh admitting to taking extra dolphin kicks after winning gold in Rio, or Kosuke Kitajima having “no response” to Aaron Peirsol’s accusation (backed up by video) that he took an illegal dolphin kick en route to gold in Beijing. Breaststrokers also seem to struggle with the modern “brush” turn, and ring up a lot of one-hand touches. The Texas results (5A, 6A) actually say what infraction was committed, and it’s the usual mix of kicking violations and one hand touches. Florida doesn’t share reasons for DQs, but just from my own experience – kicking and one hand touches. Must be in the water. Or not, because butterflyers don’t seem to suffer from the same issues. Could be the the butterfly recovery is more difficult to stop short, or to do in an unbalanced fashion such that only one hand makes contact with the wall? I don’t know, it’s an genuine mystery, but it is nice that my personal observations from officiating are borne out in the data.

Plotting Times With SwimmeR

We might be interested in seeing a distribution of times for a particular event. There’s a problem though. Swimming times are traditionally reported as minutes:seconds.hundredths. In R that means they’re stored as character strings, which can be confirmed by calling str(Results$Finals_Time). We’d like to be able to look at times from fastest to slowest though – that is we’d like them to have an order. One option would be to convert the character strings in Results$Finals_Time to an ordered factor, but that would be awful. We’d likely need to manually define the ordering – no fun at all. SwimmeR instead offers a pair of functions to make this easier. The function sec_format will take a time written as minutes:seconds.hundredths, like the ones in Results$Finals_Time and covert it to a numeric value – the total duration in seconds. Ordering numerics is easy, R handles it automatically. While that’s very nice, we as swimming aficionados would still like to look at times in the standard swimming format. This is where mmss_format comes in – it simply coverts times back to the standard minutes:seconds.hundredths for our viewing pleasure. Below is an example of converting times to numeric with sec_format for plotting, and then converting the axis labels to swimming format with mmss_format for viewing.

Results %>%
  filter(Gender == 'Girls',
         str_detect(Event, "Relay") == FALSE) %>% 
  left_join(Cuts %>% filter(Year == 2019, Gender == "F"), by = c("Event")) %>% # add in All-American cuts
  filter(Event == "Girls 200 Yard Freestyle") %>% # pick event
  mutate(Finals_Time = sec_format(Finals_Time)) %>% # change time format to seconds
  ggplot() +
  geom_histogram(aes(x = Finals_Time, fill = State), binwidth = 0.5) +
geom_vline(aes(xintercept = sec_format(AA_Cut))) + # line for All-American cut
  geom_vline(aes(xintercept = sec_format(AAC_Cut))) + # line for All-American consideration cut
  scale_y_continuous(breaks = seq(0, 10, 2)) +
  scale_x_continuous(
    breaks = seq(100, 150, 5),
    labels = scales::trans_format("identity", mmss_format) # labels in swimming format
  ) +
    geom_label(
    label="All-American",
    x = 107,
    y = 7,
    color = "black"
  ) +
      geom_label(
    label="All-American Cons.",
    x = 110.5,
    y = 10,
    color = "black"
  ) +
  theme_bw() +
  labs(y = "No. of Swimmers",
       x = "Finals Time")

Here we see classic “bell” curves for each state, with a few very fast or very slow (comparatively speaking) swimmers, and a bunch of athletes clumped right in the middle.

In Closing

Texas wins the overall, despite a winning effort from the Florida girls. Next week at Swimming + Data Science we’ll have number one seed California (1) vs. number five Pennsylvania (5). We’ll also dig further into new SwimmeR 0.4.1 functionality. Hope to see you then!

To leave a comment for the author, please follow the link and comment on their blog: Swimming + Data Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

High School Swimming State-Off Tournament Texas (2) vs. Florida (3)

Getting Results

Scoring the Meet

Swimmers of the Meet

Boys

Girls

The DQ Column

Plotting Times With SwimmeR

In Closing

Related

Getting Results

Scoring the Meet

Swimmers of the Meet

Boys

Girls

The DQ Column

Plotting Times With SwimmeR

In Closing

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)