High School Swimming State-Off Tournament Florida (3) vs. Illinois (6)

[This article was first published on Swimming + Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This week in the High School Swimming State-Off Tournament we have the third seeded Sunshine State, Florida (3) taking on the sixth seeded Illinois (6) aka the Prairie State.

library(SwimmeR)
library(dplyr)
library(stringr)
library(purrr)
library(flextable)

Florida Results

Florida has a nice results repository, with Hy-Tek real-time results. This will then be similar to last week when we first scraped a Hy-Tek real-time results page. This time though we’re going to scrape four, one for each of Florida’s four divisions.

base_4A <- "https://www.fhsaa.org/sites/default/files/orig_uploads/sports/swimming-diving/archives/2019-20/state/4A/191115F0"

base_3A <- "https://www.fhsaa.org/sites/default/files/orig_uploads/sports/swimming-diving/archives/2019-20/state/3A/191116F0"

base_2A <- "https://www.fhsaa.org/sites/default/files/orig_uploads/sports/swimming-diving/archives/2019-20/state/191108F0"

base_1A <- "https://www.fhsaa.org/sites/default/files/orig_uploads/sports/swimming-diving/archives/2019-20/state/1A/191109F0"

real_time_links <- function(base, event_numbers) { # function to make list of results links
  event_numbers <-
    sprintf("%02d", as.numeric(event_numbers)) # adds leading zeros where needed on event numbers
  links <-
    map(base, paste0, event_numbers, ".htm") # combines link base with event numbers
  links <- unlist(links, recursive = FALSE)
  return(links)
}

FL_Links <-
  real_time_links(base = c(base_4A, base_3A, base_2A, base_1A),
                  event_numbers = 1:24)

FL_Results <-
  map(FL_Links, read_results, node = "pre") %>% # map SwimmeR::read_results over the list of links
  map(swim_parse) %>%
  bind_rows() %>% # bind together results from each link
  select(Name, School, Finals_Time, Event) %>% # only the columns we need
  mutate(State = "FL") # add column for state since we'll be combining results with IL

Illinois Results

Illinois splits their results repositories into boys and girls. Final results from the Illinois girls state meet are just a scan of a printout, or as it’s sometime called .norm format. Not a good look Illinois, not a good look at all. Illinois will be penalized for this egregious data-foul by being made to use their preliminary results on the girls side. Frankly it’s the only way to include them in the meet at all, short of transcribing that miserable scan. Even I’m not that interested.

Both Illinois boys and Illinois girls (prelim) results are available as .pdf files, which SwimmeR::read_results can handle. There are some inconsistencies in spacing for class designators (“Sr”, “Jr” etc.) as well as a few leading spaces. There’s also an issue with a region name, which follos school names, but is placed in parentheses. We can clean these up with the typo and replacement arguments of swim_parse.

IL_Boys_Link <- "https://www.ihsa.org/data/swb/StateResults.pdf"
IL_Girls_Link <-
  "https://www.ihsa.org/Portals/0/prelim%20results.pdf"

IL_Results <-
  map(c(IL_Girls_Link, IL_Boys_Link), read_results) %>% # map SwimmeR::read_results over the list of links
  map(
    swim_parse,
    avoid = c("IHSA", "NFHS", "POOL", "NATIONAL"),
    typo = c(
      "Sr\\s{2,}",
      # fix issue with some class designation strings
      "Jr\\s{2,}",
      "So\\s{2,}",
      "Fr\\s{2,}",
      "\\s{2,}\\(",
      # region designation fix
      "\\s\\d{1,2}\\s{2,}" # fix leading spaces
    ),
    replacement = c("Sr ",
                    "Jr ",
                    "So ",
                    "Fr ",
                    " \\)",
                    " ")
  ) %>%
  bind_rows() %>% # bind together results from each link
  mutate(Ind = case_when(str_detect(Event, "Relay") == FALSE ~ 1,
                         TRUE ~ 0)) %>%
  mutate(
    Grade = case_when(
      is.na(Grade) == TRUE &
        Ind == 1 ~ str_extract(School, "^Fr|^So|^Jr|^Sr")
    ),
    School = case_when(is.na(Grade) == FALSE ~ str_remove(School, Grade),
                       TRUE ~ School)
  ) %>%
  select(Name, School, Finals_Time, Event) %>% # only the columns we need
  mutate(State = "IL")

Joining Up Results

Having collected results from Florida and Illinois we just need to join them up, add a column for gender and remove events outside the standard 12 event program.

Results <- bind_rows(FL_Results, IL_Results) %>%
  mutate(Gender = case_when(
    str_detect(Event, "Girls") == TRUE ~ "Girls",
    # coding gender
    str_detect(Event, "Boys") == TRUE ~ "Boys"
  )) %>%
  filter(str_detect(Event, "Swim-off") == FALSE,
         # removing extra events
         str_detect(Event, "AWD") == FALSE)

Analysis

Last week to make a point about reusable code I copied my code from the week prior. Copy and pasting code is all well and good, but even better is to functionalize code. So this week we’re going to start working on function for scoring each event type (diving, relay and individual swim). The eventual goal, over the next few weeks, is to develop this code into a robust function for inclusion in a future release of the SwimmeR package.

Below is our starting point. It’s basically just the code from previous weeks wrapped in curly brackets. Inside the curly brackets there are still cases, determined by if, to break up the input results into diving, relays, and individual swims, and then to score them exactly as before. The difference is that this function does not assume there actually are any diving, relay or individual swims. It has else conditions following each if that insure the function does not fault if a particular event type isn’t represented.


Scoring Function

results_score <-
  function(results, events, point_values, max_place) {
    point_values <- c(point_values, 0)
    names(point_values) <- 1:length(point_values)
    
    results <- results %>%
      filter(str_detect(Event, paste(events, collapse = "|")) == TRUE)
    
    if (any(str_detect(results$Event, "Diving"))) {
      # break out diving results
      diving_results <- results %>%
        filter(str_detect(Event, "Diving")) %>%
        mutate(Finals_Time = as.numeric(Finals_Time)) %>%
        group_by(Event, Name) %>%
        slice(1) %>% # first instance of every diver
        ungroup() %>%
        group_by(Event) %>%
        mutate(
          Place = rank(desc(Finals_Time), ties.method = "min"),
          # highest score gets rank 1
          Finals_Time = as.character(Finals_Time)
        ) %>%
        filter(Place <= max_place)
      
    } else {
      diving_results  <-
        results[0, ] # if there are no diving results create a blank dataframe called diving_results, with the same columns as results
    }
    
    if (any(str_detect(results$Event, "Relay"))) {
      # break out relay results
      relay_results <- results %>%
        filter(str_detect(Event, "Relay") == TRUE) %>% # only want relays
        group_by(Event, School) %>%
        slice(1) %>% # select first occurrence of team in each event
        ungroup() %>%
        mutate(Finals_Time_sec = sec_format(Finals_Time)) %>% # convert time to seconds
        group_by(Event) %>%
        mutate(Place = rank(Finals_Time_sec, ties.method = "min")) %>% # places, low number wins
        filter(Place <= max_place)
      
    } else {
      relay_results  <-
        results[0, ] # if there are no relay results create a blank dataframe called relay_results, with the same columns as results
    }
    
    if (any(str_detect(results$Event, "Diving|Relay") == FALSE)) {
      # break out non diving/relay events
      ind_results <- results %>%
        filter(str_detect(Event, "Diving|Relay") == FALSE) %>%
        group_by(Event, Name) %>%
        slice(1) %>% # first instance of every swimmer
        ungroup() %>%
        group_by(Event) %>%
        mutate(Finals_Time_sec = sec_format(Finals_Time)) %>% # time as seconds
        mutate(Place = rank(Finals_Time_sec, ties.method = "min")) %>% # places, low number wins
        filter(Place <= max_place)
      
    } else {
      ind_results <-
        results[0, ] # if there are no non relay/diving results create a blank dataframe called ind_results, with the same columns as results
    }
    
    results <-
      bind_rows(ind_results, diving_results, relay_results) # bind together all the dataframes.  This is why we needed blank dataframes in the case of no relay/diving results
    
    results <- results %>% # deal with ties
      group_by(Event) %>%
      mutate(New_Place = rank(Place, ties.method = "first"),
             Points = point_values[New_Place]) %>%
      group_by(Place, Event) %>%
      summarize(Points = mean(Points)) %>%
      inner_join(results) %>%
      mutate(Points = case_when(str_detect(Event, "Relay") == TRUE ~ Points * 2,
                                TRUE ~ Points)) %>%
      ungroup()
    
  }

In future installments of the Swim-Off we’ll add peices of code to this function to sanitize inputs, provide error messages, and eventually to score other meet formats.
Because results_score can score arbitrary events, we don’t actually have to score an entire meet. We can select only a few events. Say we were interested in short axis strokes, butterfly and breaststroke. Those events can be passed to results_score in place of the full suite of events.

Results_Short_Axis <-
  results_score(
    results = Results,
    events = c("Butterfly", "Breaststroke"),
    point_values = c(20, 17, 16, 15, 14, 13, 12, 11, 9, 7, 6, 5, 4, 3, 2, 1),
    max_place = 16
  )

Results_Short_Axis %>%
  filter(Place <= 2) %>%
  select(Place, Name, State, Event, Points) %>%
  arrange(Event, Place) %>%
  flextable() %>%
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>%
  autofit()

Place

Name

State

Event

Points

1

Iida, Max

IL

Boys 100 Yard Breaststroke

20

2

Alderson, Justin

IL

Boys 100 Yard Breaststroke

17

1

Filipovic, Aleksej

IL

Boys 100 Yard Butterfly

20

2

Jones, William

FL

Boys 100 Yard Butterfly

17

1

Gridley, Kaelyn

IL

Girls 100 Yard Breaststroke

20

2

Schwab, Carly

FL

Girls 100 Yard Breaststroke

17

1

Peoples, Olivia

FL

Girls 100 Yard Butterfly

20

2

Stone, McKenna

IL

Girls 100 Yard Butterfly

17


Scored Results

we are interested in scoring a full meet though, so let’s pass all the events from Florida vs. Illinois to results_score and get the scores for each event.

Results_Final <-
  results_score(
    results = Results,
    events = unique(Results$Event),
    point_values = c(20, 17, 16, 15, 14, 13, 12, 11, 9, 7, 6, 5, 4, 3, 2, 1),
    max_place = 16
  )

Scores <- Results_Final %>%
  group_by(State, Gender) %>%
  summarise(Score = sum(Points))

Florida wins both the boys and girls meets, but it’s close, especially on the boys side. On the girls side Illinois was likely hampered by not having their finals swims available. Serves them right for their lousy data practices.

Scores %>%
  arrange(Gender, desc(Score)) %>%
  ungroup() %>%
  flextable() %>%
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>%
  autofit()

State

Gender

Score

FL

Boys

1286

IL

Boys

1039

FL

Girls

1410

IL

Girls

915

Scores %>%
  group_by(State) %>%
  summarise(Score = sum(Score)) %>%
  arrange(desc(Score)) %>%
  ungroup() %>%
  flextable() %>%
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>%
  autofit()

State

Score

FL

2696

IL

1954

Let’s take a closer look, and see how many events each state won, by gender.

Results_Final %>%
  filter(Place == 1) %>%
  select(Event, State, Gender) %>%
  group_by(Gender, State) %>%
  summarise(Total = n()) %>%
  arrange(Gender, desc(Total)) %>%
  flextable() %>%
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>%
  autofit()

Gender

State

Total

Boys

IL

11

Boys

FL

1

Girls

FL

7

Girls

IL

5

The Illinois boys won 11 of 12 events – that’s impressive! Let’s see that listed out:

Results_Final <-
  Results_Final[order(match(Results_Final$Event, Results$Event)),] # to order events correctly

Results_Final %>%
  filter(Gender == "Boys",
         Place == 1) %>%
  select(Name, School, State, Finals_Time, Event) %>%
  flextable() %>%
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>%
  autofit()

Name

School

State

Finals_Time

Event

Oak Park (O.P.-River Forest)

IL

1:31.55

Boys 200 Yard Medley Relay

Maurer, Luke

Wilmette (Loyola Academy)

IL

1:36.35

Boys 200 Yard Freestyle

Iida, Max

Glenview (Glenbrook South)

IL

1:47.40

Boys 200 Yard IM

Boyle, Connor

Naperville (Neuqua Valley)

IL

20.04

Boys 50 Yard Freestyle

Williams, Jack

Flossmoor (Homewood-F.)

IL

544.2

Boys 1 mtr Diving

Filipovic, Aleksej

St. Charles (North)

IL

48.25

Boys 100 Yard Butterfly

Boyle, Connor

Naperville (Neuqua Valley)

IL

43.82

Boys 100 Yard Freestyle

Andrew, Everet

Wilmette (Loyola Academy)

IL

4:24.50

Boys 500 Yard Freestyle

LaGrange (Lyons)

IL

1:22.70

Boys 200 Yard Freestyle Relay

Zuchowski, Joshua

King’s Academy

FL

47.85

Boys 100 Yard Backstroke

Iida, Max

Glenview (Glenbrook South)

IL

54.62

Boys 100 Yard Breaststroke

Wilmette (Loyola Academy)

IL

3:02.80

Boys 400 Yard Freestyle Relay


Swimmers of the Meet

Just to remind everyone of the Swimmer of the Meet criteria first we’ll look for athletes who won two events, thereby scoring a the maximum possible forty points. We’ll also grab the All-American cuts to use as a tiebreaker, in case multiple athletes win two events.

Cuts_Link <-
  "https://raw.githubusercontent.com/gpilgrim2670/Pilgrim_Data/master/State_Cuts.csv"
Cuts <- read.csv(url(Cuts_Link))

'%!in%' <- function(x, y)
  ! ('%in%'(x, y)) # "not in" function

Cuts <- Cuts %>% # clean up Cuts
  filter(Stroke %!in% c("MR", "FR", "11 Dives")) %>%
  rename(Gender = Sex) %>%
  mutate(
    Event = case_when((Distance == 200 & #match events
                         Stroke == 'Free') ~ "200 Yard Freestyle",
                      (Distance == 200 &
                         Stroke == 'IM') ~ "200 Yard IM",
                      (Distance == 50 &
                         Stroke == 'Free') ~ "50 Yard Freestyle",
                      (Distance == 100 &
                         Stroke == 'Fly') ~ "100 Yard Butterfly",
                      (Distance == 100 &
                         Stroke == 'Free') ~ "100 Yard Freestyle",
                      (Distance == 500 &
                         Stroke == 'Free') ~ "500 Yard Freestyle",
                      (Distance == 100 &
                         Stroke == 'Back') ~ "100 Yard Backstroke",
                      (Distance == 100 &
                         Stroke == 'Breast') ~ "100 Yard Breaststroke",
                      TRUE ~ paste(Distance, "Yard", Stroke, sep = " ")
    ),
    
    Event = case_when(
      Gender == "M" ~ paste("Boys", Event, sep = " "),
      Gender == "F" ~ paste("Girls", Event, sep = " ")
    )
  )

Ind_Swimming_Results <- Results_Final %>%
  filter(str_detect(Event, "Diving|Relay") == FALSE) %>% # join Ind_Swimming_Results and Cuts
  left_join(Cuts %>% filter((Gender == "M" &
                               Year == 2020) |
                              (Gender == "F" &
                                 Year == 2019)) %>%
              select(AAC_Cut, AA_Cut, Event),
            by = 'Event')

Swimmer_Of_Meet <- Ind_Swimming_Results %>%
  mutate(
    AA_Diff = (Finals_Time_sec - sec_format(AA_Cut)) / sec_format(AA_Cut),
    Name = str_to_title(Name)
  ) %>%
  group_by(Name) %>%
  filter(n() == 2) %>% # get swimmers that competed in two events
  summarise(
    Avg_Place = sum(Place) / 2,
    AA_Diff_Avg = round(mean(AA_Diff, na.rm = TRUE), 3),
    Gender = unique(Gender),
    State = unique(State)
  ) %>%
  arrange(Avg_Place, AA_Diff_Avg) %>%
  group_split(Gender) # split out a dataframe for boys (1) and girls (2)


Boys

As we’ve already seen, Illinois boys won 11 of 12 events, with Connor Boyle and Max Iida both winning two apiece. That means we’ll be going to the All-American standard tiebreaker.

Swimmer_Of_Meet[[1]] %>% 
  slice_head(n = 5) %>% 
  select(-Gender) %>% 
  ungroup() %>%
  flextable() %>% 
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>% 
  autofit()

Name

Avg_Place

AA_Diff_Avg

State

Boyle, Connor

1.0

-0.025

IL

Iida, Max

1.0

-0.021

IL

Zuchowski, Joshua

1.5

-0.025

FL

Maurer, Luke

1.5

-0.022

IL

Tirheimer, Logan

3.0

-0.013

FL


It’s Connor Boyle by a nose, just squeaking past Max Iida. Interestingly Florida’s lone event winner, Joshua Zuchowski, had the same All-American score as Connor, but only won one event, finishing second to Max in the 200 IM. He’s young though, so maybe next year.

Results_Final %>%
  filter(Name == "Boyle, Connor") %>%
  select(Place, Name, School, Finals_Time, Event) %>%
  arrange(desc(Event)) %>%
  ungroup() %>%
  flextable() %>%
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>%
  autofit()

Place

Name

School

Finals_Time

Event

1

Boyle, Connor

Naperville (Neuqua Valley)

20.04

Boys 50 Yard Freestyle

1

Boyle, Connor

Naperville (Neuqua Valley)

43.82

Boys 100 Yard Freestyle


Girls

Swimmer_Of_Meet[[2]] %>%
  slice_head(n = 5) %>%
  select(-Gender) %>%
  ungroup() %>%
  flextable() %>% 
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>% 
  autofit()

Name

Avg_Place

AA_Diff_Avg

State

Cronk, Micayla

1.0

-0.040

FL

Weyant, Emma

1.0

-0.033

FL

Cooper, Grace

1.5

-0.027

IL

Stege, Rachel

2.0

-0.020

IL

Novelline, Carly

2.5

-0.016

IL


The girls meet was overall much stronger, with the top three girls swimmers further under the All-American cuts than any of the boys. Micayla Cronk was especially excellent, not only winning two events, but doing so with extremely impressive times. Emma Weyant also won two events, but was pipped by Micayla on the All-American tie-breaker.

Results_Final %>%
  filter(Name == "Cronk, Micayla") %>%
  select(Place, Name, School, Finals_Time, Event) %>%
  arrange(desc(Event)) %>% 
  ungroup() %>%
  flextable() %>% 
  bold(part = "header") %>%
  bg(bg = "#D3D3D3", part = "header") %>% 
  autofit()

Place

Name

School

Finals_Time

Event

1

Cronk, Micayla

Flagler

1:44.39

Girls 200 Yard Freestyle

1

Cronk, Micayla

Flagler

48.20

Girls 100 Yard Freestyle


In Closing

Thanks for joining us for another installment of the High School Swimming State-Off here at Swimming + Data Science. Come back next week for Texas (2) vs. Ohio (7), where we’ll be making further improvements to our new results_score function.

To leave a comment for the author, please follow the link and comment on their blog: Swimming + Data Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)