SwimmeR version 0.7.2 – Now Better than Ever

[This article was first published on Welcome to Swimming + Data Science on Swimming + Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

SwimmeR version 0.7.2 is now available from CRAN. This new version contains some new features, plus a few changes to make it more user-friendly. Let me show you what I’ve been working on.

library(SwimmeR)
library(dplyr)
library(stringr)
library(flextable)
library(rbenchmark)

flextable_style <- function(x) {
  x %>%
    flextable() %>%
    bold(part = "header") %>% # bold header
    bg(bg = "#D3D3D3", part = "header") %>% # puts gray background behind the header row
    align_nottext_col(align = "center", header = TRUE, footer = TRUE) %>% # center alignment
    autofit()
}

New Features

  • SwimmeR can now parse S.A.M.M.S. style results. S.A.M.M.S., which stands for Swimclub And Meet Management System, was an ahem swim club and meet management system that predated Hy-Tek’s Meet and Team Manager. It seems to have been most popular in California, where it’s still used by USA Swimming clubs and high schools into the present day.

S.A.M.M.S. meets look like this:

Parsing them is a simple matter for you SwimmeR users – it’s exactly the same as parsing Hy-Tek style results. The only differences come in with respect to relay_swimmers and splits. Same read_results, same swim_parse. S.A.M.M.S. results that I’ve seen don’t include relay swimmers, so of course SwimmeR doesn’t collect them. Splits are also rarely seen in S.A.M.M.S. results and at this moment are also not collected by SwimmeR, although they may be in a future release.

df <-
  swim_parse(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1629/1119bac.htm"
    )
  )

df %>%
  head(5) %>%
  flextable_style()

Place

Name

Age

Team

Finals_Time

DQ

Event

1

LADOMIRAK, ALEGRIA

8

PC PALO ALTO STANFORD

16.11

0

EVENT 73 FEMALE 8&UN 25 FREE

2

DIEHN, EVA

8

PC BULL DOG SWIM CLUB

16.36

0

EVENT 73 FEMALE 8&UN 25 FREE

3

HILL, NAOMI

8

PC PALO ALTO STANFORD

16.50

0

EVENT 73 FEMALE 8&UN 25 FREE

4

HOUTZER, AMELIA

8

PC PALO ALTO STANFORD

16.88

0

EVENT 73 FEMALE 8&UN 25 FREE

5

CHANG, KAYLA

8

PC BURLINGAME AQUATIC

17.55

0

EVENT 73 FEMALE 8&UN 25 FREE

On a personal level working with these S.A.M.M.S. results was very encouraging, because they have all kinds of weird bugs and cut corners that make me feel better about SwimmeR. For example some S.A.M.M.S. results list a place order for finals swims, as “F1”, “F2”, etc. But S.A.M.M.S. can’t handle more than two characters in that field, so if someone comes in 10th they just get “F”.

Just “F”

S.A.M.M.S. also doesn’t know what to make of diving, and records diving results like swimming results, so “347.56” is written as “3:47.56” (swim_parse corrects this). S.A.M.M.S. also orders diving results backwards with the lowest (i.e. fastest) score/time listed first.

Maybe divers don’t mind being upside down?

S.A.M.M.S. was a commercial product. SwimmeR might have its issues sometimes, but at least it’s free!

  • Under the hood changes to speed up swim_parse. We can test this with benchmark from the rbenchmark package because I’ve left the old swim_parse function in SwimmeR, renamed swim_parse_old. It’s not exported though so to actually access it you’ll need to call it as SwimmeR:::swim_parse_old.
benchmark("new" = {
  swim_parse(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
    )
  )
},
"old" = {
  SwimmeR:::swim_parse_old(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
    )
  )
},
replications = 5) %>% 
  flextable_style()

test

replications

elapsed

relative

user.self

sys.self

new

5

33.62

1.000

30.67

0.06

old

5

76.72

2.282

74.16

0.09

As you can see, from the relative column above, the new version of swim_parse is a little over twice as fast as the old version (on my computer at least). You’re all very welcome.

  • Kinder and gentler all around. There have been several changes to make swim_parse more user friendly. First is decreased reliance on the typo and replacement arguments. They’re still present, and still work, but they’re hopefully now much less necessary.

By way of example in this meet there’s a young man named “DU Fayet DE LA Tour, Vin”, as seen here:

swim_parse_old struggles with this, and gets his name wrong.

df_old <-
  SwimmeR:::swim_parse_old(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
    )
  )

df_old %>%
  filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
  select(-Points,-DQ,-Exhibition) %>%
  flextable_style()

Place

Name

Age

Team

Prelims_Time

Finals_Time

Event

13

DU Fayet DE LA Tour

14

NBA-PC

1:02.09

1:00.08

Boys 13-14 100 Yard Freestyle

12

DU Fayet DE LA Tour

14

NBA-PC

1:20.75

1:09.03

Boys 13-14 100 Yard Backstroke

14

DU Fayet DE LA Tour

14

NBA-PC

1:16.16

1:10.01

Boys 13-14 100 Yard Butterfly

16

DU Fayet DE LA Tour

14

NBA-PC

2:50.00

2:35.25

Boys 13-14 200 Yard IM

We can fix the problem in a hacky, and non-intuitive kind of way using typo and replacement, plus some after the parse changes. It works, but it’s not terribly easy.

df_old_tr <-
  SwimmeR:::swim_parse_old(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
    ),
    typo = ", Vin ",
    replacement = " Vin  "
  ) %>%
  mutate(Name = str_replace(Name, " Vin", ", Vin"))

df_old_tr %>%
  filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
  select(-Points, -DQ, -Exhibition) %>%
  flextable_style()

Place

Name

Age

Team

Prelims_Time

Finals_Time

Event

13

DU Fayet DE LA Tour, Vin

14

NBA-PC

1:02.09

1:00.08

Boys 13-14 100 Yard Freestyle

12

DU Fayet DE LA Tour, Vin

14

NBA-PC

1:20.75

1:09.03

Boys 13-14 100 Yard Backstroke

14

DU Fayet DE LA Tour, Vin

14

NBA-PC

1:16.16

1:10.01

Boys 13-14 100 Yard Butterfly

16

DU Fayet DE LA Tour, Vin

14

NBA-PC

2:50.00

2:35.25

Boys 13-14 200 Yard IM

Compare that to the much simpler approach available in swimmeR version 0.7.2. – no need for typo & replacement, and no need to after-parse fixes to Vin’s name.

df_new <-
  swim_parse(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
    )
  )

df_new %>%
  filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
  select(-Points,-DQ,-Exhibition) %>%
  flextable_style()

Place

Name

Age

Team

Prelims_Time

Finals_Time

Event

13

DU Fayet DE LA Tour, Vin

14

NBA-PC

1:02.09

1:00.08

Boys 13-14 100 Yard Freestyle

12

DU Fayet DE LA Tour, Vin

14

NBA-PC

1:20.75

1:09.03

Boys 13-14 100 Yard Backstroke

14

DU Fayet DE LA Tour, Vin

14

NBA-PC

1:16.16

1:10.01

Boys 13-14 100 Yard Butterfly

16

DU Fayet DE LA Tour, Vin

14

NBA-PC

2:50.00

2:35.25

Boys 13-14 200 Yard IM

This is not a promise that there will be no need for typo and replacement. Sometimes there really are typos that need replacing. Things should be easier now though.

Second – event names were also an issue in older versions of SwimmeR. If swim_parse didn’t find any event names it liked it would throw an error and return nothing. Now, in swimmeR version 0.7.2 the event name definitions are much broader, and failing to find any event names will not result in an error.

These results, from the 2019 Australian Nationals won’t read in previous version of SwimmeR because the events are named with “Metre” rather than “Meter”. Now though, with SwimmeR version 0.7.2 we can see the Campbell sisters doing their thing.

df_aus <-
  swim_parse(
    read_results(
      "https://www.swimming.org.au/sites/default/files/assets/documents/full%20results_0.pdf"
    )
  )

df_aus %>%
  head(2) %>%
  flextable_style()

Place

Name

Age

Team

Prelims_Time

Finals_Time

Points

DQ

Exhibition

Event

1

CAMPBELL, CATE

27

KNOX PYMBLE

24.33

24.05

953

0

0

Women 50 LC Metre Freestyle

2

CAMPBELL, BRONTE

25

KNOX PYMBLE

24.60

24.17

939

0

0

Women 50 LC Metre Freestyle

  • Modifications to swim_parse to begin to handle older style Hy-Tek results, like these from 2002. Issues with inconstant treatment of splits within the results themselves remain, so let the user beware. These older results are still an active area of development.
df_2002 <-
  swim_parse(
    read_results(
      "https://cdn.swimswam.com/wp-content/uploads/2018/08/2002-Division-I-NCAA-Championships-Men-results1.pdf"
    )
  )

df_2002 %>%
  filter(str_detect(Event, "100 Yard BUTTERFLY")) %>%
  head(3) %>%
  flextable_style()

Place

Name

Age

Team

Prelims_Time

Finals_Time

Points

DQ

Exhibition

Event

1

CROCKER, IAN

SO

TEXAS

45.70

45.44

NA

0

0

Event 9 MEN’s 100 Yard BUTTERFLY

2

MARSHALL, PETER

SO

STANFORD

46.39

46.48

NA

0

0

Event 9 MEN’s 100 Yard BUTTERFLY

3

SCHOEMAN, ROLAND

SR

ARIZONA

46.57

46.50

NA

0

0

Event 9 MEN’s 100 Yard BUTTERFLY

  • Bug fixes, always bug fixes.

In Closing

Please do download the newest version of SwimmeR from wherever you get your packages. You’re also welcome to submit bug reports or feature requests on the SwimmeR project github page.

To leave a comment for the author, please follow the link and comment on their blog: Welcome to Swimming + Data Science on Swimming + Data Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)