Site icon R-bloggers

SwimmeR version 0.7.2 – Now Better than Ever

[This article was first published on Welcome to Swimming + Data Science on Swimming + Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • SwimmeR version 0.7.2 is now available from CRAN. This new version contains some new features, plus a few changes to make it more user-friendly. Let me show you what I’ve been working on.

    library(SwimmeR)
    library(dplyr)
    library(stringr)
    library(flextable)
    library(rbenchmark)
    
    flextable_style <- function(x) {
      x %>%
        flextable() %>%
        bold(part = "header") %>% # bold header
        bg(bg = "#D3D3D3", part = "header") %>% # puts gray background behind the header row
        align_nottext_col(align = "center", header = TRUE, footer = TRUE) %>% # center alignment
        autofit()
    }

    New Features

    • SwimmeR can now parse S.A.M.M.S. style results. S.A.M.M.S., which stands for Swimclub And Meet Management System, was an ahem swim club and meet management system that predated Hy-Tek’s Meet and Team Manager. It seems to have been most popular in California, where it’s still used by USA Swimming clubs and high schools into the present day.

    S.A.M.M.S. meets look like this:

    Parsing them is a simple matter for you SwimmeR users – it’s exactly the same as parsing Hy-Tek style results. The only differences come in with respect to relay_swimmers and splits. Same read_results, same swim_parse. S.A.M.M.S. results that I’ve seen don’t include relay swimmers, so of course SwimmeR doesn’t collect them. Splits are also rarely seen in S.A.M.M.S. results and at this moment are also not collected by SwimmeR, although they may be in a future release.

    df <-
      swim_parse(
        read_results(
          "http://www.pacswim.org/userfiles/meets/documents/1629/1119bac.htm"
        )
      )
    
    df %>%
      head(5) %>%
      flextable_style()

    Place

    Name

    Age

    Team

    Finals_Time

    DQ

    Event

    1

    LADOMIRAK, ALEGRIA

    8

    PC PALO ALTO STANFORD

    16.11

    0

    EVENT 73 FEMALE 8&UN 25 FREE

    2

    DIEHN, EVA

    8

    PC BULL DOG SWIM CLUB

    16.36

    0

    EVENT 73 FEMALE 8&UN 25 FREE

    3

    HILL, NAOMI

    8

    PC PALO ALTO STANFORD

    16.50

    0

    EVENT 73 FEMALE 8&UN 25 FREE

    4

    HOUTZER, AMELIA

    8

    PC PALO ALTO STANFORD

    16.88

    0

    EVENT 73 FEMALE 8&UN 25 FREE

    5

    CHANG, KAYLA

    8

    PC BURLINGAME AQUATIC

    17.55

    0

    EVENT 73 FEMALE 8&UN 25 FREE

    On a personal level working with these S.A.M.M.S. results was very encouraging, because they have all kinds of weird bugs and cut corners that make me feel better about SwimmeR. For example some S.A.M.M.S. results list a place order for finals swims, as “F1”, “F2”, etc. But S.A.M.M.S. can’t handle more than two characters in that field, so if someone comes in 10th they just get “F”.

    Just “F”

    S.A.M.M.S. also doesn’t know what to make of diving, and records diving results like swimming results, so “347.56” is written as “3:47.56” (swim_parse corrects this). S.A.M.M.S. also orders diving results backwards with the lowest (i.e. fastest) score/time listed first.

    Maybe divers don’t mind being upside down?

    S.A.M.M.S. was a commercial product. SwimmeR might have its issues sometimes, but at least it’s free!

    • Under the hood changes to speed up swim_parse. We can test this with benchmark from the rbenchmark package because I’ve left the old swim_parse function in SwimmeR, renamed swim_parse_old. It’s not exported though so to actually access it you’ll need to call it as SwimmeR:::swim_parse_old.
    benchmark("new" = {
      swim_parse(
        read_results(
          "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
        )
      )
    },
    "old" = {
      SwimmeR:::swim_parse_old(
        read_results(
          "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
        )
      )
    },
    replications = 5) %>% 
      flextable_style()

    test

    replications

    elapsed

    relative

    user.self

    sys.self

    new

    5

    33.62

    1.000

    30.67

    0.06

    old

    5

    76.72

    2.282

    74.16

    0.09

    As you can see, from the relative column above, the new version of swim_parse is a little over twice as fast as the old version (on my computer at least). You’re all very welcome.
    • Kinder and gentler all around. There have been several changes to make swim_parse more user friendly. First is decreased reliance on the typo and replacement arguments. They’re still present, and still work, but they’re hopefully now much less necessary.

    By way of example in this meet there’s a young man named “DU Fayet DE LA Tour, Vin”, as seen here:

    swim_parse_old struggles with this, and gets his name wrong.

    df_old <-
      SwimmeR:::swim_parse_old(
        read_results(
          "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
        )
      )
    
    df_old %>%
      filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
      select(-Points,-DQ,-Exhibition) %>%
      flextable_style()

    Place

    Name

    Age

    Team

    Prelims_Time

    Finals_Time

    Event

    13

    DU Fayet DE LA Tour

    14

    NBA-PC

    1:02.09

    1:00.08

    Boys 13-14 100 Yard Freestyle

    12

    DU Fayet DE LA Tour

    14

    NBA-PC

    1:20.75

    1:09.03

    Boys 13-14 100 Yard Backstroke

    14

    DU Fayet DE LA Tour

    14

    NBA-PC

    1:16.16

    1:10.01

    Boys 13-14 100 Yard Butterfly

    16

    DU Fayet DE LA Tour

    14

    NBA-PC

    2:50.00

    2:35.25

    Boys 13-14 200 Yard IM

    We can fix the problem in a hacky, and non-intuitive kind of way using typo and replacement, plus some after the parse changes. It works, but it’s not terribly easy.

    df_old_tr <-
      SwimmeR:::swim_parse_old(
        read_results(
          "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
        ),
        typo = ", Vin ",
        replacement = " Vin  "
      ) %>%
      mutate(Name = str_replace(Name, " Vin", ", Vin"))
    
    df_old_tr %>%
      filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
      select(-Points, -DQ, -Exhibition) %>%
      flextable_style()

    Place

    Name

    Age

    Team

    Prelims_Time

    Finals_Time

    Event

    13

    DU Fayet DE LA Tour, Vin

    14

    NBA-PC

    1:02.09

    1:00.08

    Boys 13-14 100 Yard Freestyle

    12

    DU Fayet DE LA Tour, Vin

    14

    NBA-PC

    1:20.75

    1:09.03

    Boys 13-14 100 Yard Backstroke

    14

    DU Fayet DE LA Tour, Vin

    14

    NBA-PC

    1:16.16

    1:10.01

    Boys 13-14 100 Yard Butterfly

    16

    DU Fayet DE LA Tour, Vin

    14

    NBA-PC

    2:50.00

    2:35.25

    Boys 13-14 200 Yard IM

    Compare that to the much simpler approach available in swimmeR version 0.7.2. – no need for typo & replacement, and no need to after-parse fixes to Vin’s name.

    df_new <-
      swim_parse(
        read_results(
          "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
        )
      )
    
    df_new %>%
      filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
      select(-Points,-DQ,-Exhibition) %>%
      flextable_style()

    Place

    Name

    Age

    Team

    Prelims_Time

    Finals_Time

    Event

    13

    DU Fayet DE LA Tour, Vin

    14

    NBA-PC

    1:02.09

    1:00.08

    Boys 13-14 100 Yard Freestyle

    12

    DU Fayet DE LA Tour, Vin

    14

    NBA-PC

    1:20.75

    1:09.03

    Boys 13-14 100 Yard Backstroke

    14

    DU Fayet DE LA Tour, Vin

    14

    NBA-PC

    1:16.16

    1:10.01

    Boys 13-14 100 Yard Butterfly

    16

    DU Fayet DE LA Tour, Vin

    14

    NBA-PC

    2:50.00

    2:35.25

    Boys 13-14 200 Yard IM

    This is not a promise that there will be no need for typo and replacement. Sometimes there really are typos that need replacing. Things should be easier now though.

    Second – event names were also an issue in older versions of SwimmeR. If swim_parse didn’t find any event names it liked it would throw an error and return nothing. Now, in swimmeR version 0.7.2 the event name definitions are much broader, and failing to find any event names will not result in an error.

    These results, from the 2019 Australian Nationals won’t read in previous version of SwimmeR because the events are named with “Metre” rather than “Meter”. Now though, with SwimmeR version 0.7.2 we can see the Campbell sisters doing their thing.

    df_aus <-
      swim_parse(
        read_results(
          "https://www.swimming.org.au/sites/default/files/assets/documents/full%20results_0.pdf"
        )
      )
    
    df_aus %>%
      head(2) %>%
      flextable_style()

    Place

    Name

    Age

    Team

    Prelims_Time

    Finals_Time

    Points

    DQ

    Exhibition

    Event

    1

    CAMPBELL, CATE

    27

    KNOX PYMBLE

    24.33

    24.05

    953

    0

    0

    Women 50 LC Metre Freestyle

    2

    CAMPBELL, BRONTE

    25

    KNOX PYMBLE

    24.60

    24.17

    939

    0

    0

    Women 50 LC Metre Freestyle

    • Modifications to swim_parse to begin to handle older style Hy-Tek results, like these from 2002. Issues with inconstant treatment of splits within the results themselves remain, so let the user beware. These older results are still an active area of development.
    df_2002 <-
      swim_parse(
        read_results(
          "https://cdn.swimswam.com/wp-content/uploads/2018/08/2002-Division-I-NCAA-Championships-Men-results1.pdf"
        )
      )
    
    df_2002 %>%
      filter(str_detect(Event, "100 Yard BUTTERFLY")) %>%
      head(3) %>%
      flextable_style()

    Place

    Name

    Age

    Team

    Prelims_Time

    Finals_Time

    Points

    DQ

    Exhibition

    Event

    1

    CROCKER, IAN

    SO

    TEXAS

    45.70

    45.44

    NA

    0

    0

    Event 9 MEN’s 100 Yard BUTTERFLY

    2

    MARSHALL, PETER

    SO

    STANFORD

    46.39

    46.48

    NA

    0

    0

    Event 9 MEN’s 100 Yard BUTTERFLY

    3

    SCHOEMAN, ROLAND

    SR

    ARIZONA

    46.57

    46.50

    NA

    0

    0

    Event 9 MEN’s 100 Yard BUTTERFLY

    • Bug fixes, always bug fixes.

    In Closing

    Please do download the newest version of SwimmeR from wherever you get your packages. You’re also welcome to submit bug reports or feature requests on the SwimmeR project github page.

    To leave a comment for the author, please follow the link and comment on their blog: Welcome to Swimming + Data Science on Swimming + Data Science.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.