SwimmeR version 0.7.2 – Now Better than Ever

Welcome to Swimming + Data Science on Swimming + Data Science

1 year ago

[This article was first published on Welcome to Swimming + Data Science on Swimming + Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

SwimmeR version 0.7.2 is now available from CRAN. This new version contains some new features, plus a few changes to make it more user-friendly. Let me show you what I’ve been working on.

library(SwimmeR)
library(dplyr)
library(stringr)
library(flextable)
library(rbenchmark)

flextable_style <- function(x) {
  x %>%
    flextable() %>%
    bold(part = "header") %>% # bold header
    bg(bg = "#D3D3D3", part = "header") %>% # puts gray background behind the header row
    align_nottext_col(align = "center", header = TRUE, footer = TRUE) %>% # center alignment
    autofit()
}

New Features

SwimmeR can now parse S.A.M.M.S. style results. S.A.M.M.S., which stands for Swimclub And Meet Management System, was an ahem swim club and meet management system that predated Hy-Tek’s Meet and Team Manager. It seems to have been most popular in California, where it’s still used by USA Swimming clubs and high schools into the present day.

S.A.M.M.S. meets look like this:

Parsing them is a simple matter for you SwimmeR users – it’s exactly the same as parsing Hy-Tek style results. The only differences come in with respect to relay_swimmers and splits. Same read_results, same swim_parse. S.A.M.M.S. results that I’ve seen don’t include relay swimmers, so of course SwimmeR doesn’t collect them. Splits are also rarely seen in S.A.M.M.S. results and at this moment are also not collected by SwimmeR, although they may be in a future release.

df <-
  swim_parse(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1629/1119bac.htm"
    )
  )

df %>%
  head(5) %>%
  flextable_style()

Place	Name	Age	Team	Finals_Time	DQ	Event
1	LADOMIRAK, ALEGRIA	8	PC PALO ALTO STANFORD	16.11	0	EVENT 73 FEMALE 8&UN 25 FREE
2	DIEHN, EVA	8	PC BULL DOG SWIM CLUB	16.36	0	EVENT 73 FEMALE 8&UN 25 FREE
3	HILL, NAOMI	8	PC PALO ALTO STANFORD	16.50	0	EVENT 73 FEMALE 8&UN 25 FREE
4	HOUTZER, AMELIA	8	PC PALO ALTO STANFORD	16.88	0	EVENT 73 FEMALE 8&UN 25 FREE
5	CHANG, KAYLA	8	PC BURLINGAME AQUATIC	17.55	0	EVENT 73 FEMALE 8&UN 25 FREE

On a personal level working with these S.A.M.M.S. results was very encouraging, because they have all kinds of weird bugs and cut corners that make me feel better about SwimmeR. For example some S.A.M.M.S. results list a place order for finals swims, as “F1”, “F2”, etc. But S.A.M.M.S. can’t handle more than two characters in that field, so if someone comes in 10th they just get “F”.

Just “F”

S.A.M.M.S. also doesn’t know what to make of diving, and records diving results like swimming results, so “347.56” is written as “3:47.56” (swim_parse corrects this). S.A.M.M.S. also orders diving results backwards with the lowest (i.e. fastest) score/time listed first.

Maybe divers don’t mind being upside down?

S.A.M.M.S. was a commercial product. SwimmeR might have its issues sometimes, but at least it’s free!

Under the hood changes to speed up swim_parse. We can test this with benchmark from the rbenchmark package because I’ve left the old swim_parse function in SwimmeR, renamed swim_parse_old. It’s not exported though so to actually access it you’ll need to call it as SwimmeR:::swim_parse_old.

benchmark("new" = {
  swim_parse(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
    )
  )
},
"old" = {
  SwimmeR:::swim_parse_old(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
    )
  )
},
replications = 5) %>% 
  flextable_style()

test	replications	elapsed	relative	user.self	sys.self
new	5	33.62	1.000	30.67	0.06
old	5	76.72	2.282	74.16	0.09

As you can see, from the relative column above, the new version of swim_parse is a little over twice as fast as the old version (on my computer at least). You’re all very welcome.

Kinder and gentler all around. There have been several changes to make swim_parse more user friendly. First is decreased reliance on the typo and replacement arguments. They’re still present, and still work, but they’re hopefully now much less necessary.

By way of example in this meet there’s a young man named “DU Fayet DE LA Tour, Vin”, as seen here:

swim_parse_old struggles with this, and gets his name wrong.

df_old <-
  SwimmeR:::swim_parse_old(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
    )
  )

df_old %>%
  filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
  select(-Points,-DQ,-Exhibition) %>%
  flextable_style()

Place	Name	Age	Team	Prelims_Time	Finals_Time	Event
13	DU Fayet DE LA Tour	14	NBA-PC	1:02.09	1:00.08	Boys 13-14 100 Yard Freestyle
12	DU Fayet DE LA Tour	14	NBA-PC	1:20.75	1:09.03	Boys 13-14 100 Yard Backstroke
14	DU Fayet DE LA Tour	14	NBA-PC	1:16.16	1:10.01	Boys 13-14 100 Yard Butterfly
16	DU Fayet DE LA Tour	14	NBA-PC	2:50.00	2:35.25	Boys 13-14 200 Yard IM

We can fix the problem in a hacky, and non-intuitive kind of way using typo and replacement, plus some after the parse changes. It works, but it’s not terribly easy.

df_old_tr <-
  SwimmeR:::swim_parse_old(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
    ),
    typo = ", Vin ",
    replacement = " Vin  "
  ) %>%
  mutate(Name = str_replace(Name, " Vin", ", Vin"))

df_old_tr %>%
  filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
  select(-Points, -DQ, -Exhibition) %>%
  flextable_style()

Place	Name	Age	Team	Prelims_Time	Finals_Time	Event
13	DU Fayet DE LA Tour, Vin	14	NBA-PC	1:02.09	1:00.08	Boys 13-14 100 Yard Freestyle
12	DU Fayet DE LA Tour, Vin	14	NBA-PC	1:20.75	1:09.03	Boys 13-14 100 Yard Backstroke
14	DU Fayet DE LA Tour, Vin	14	NBA-PC	1:16.16	1:10.01	Boys 13-14 100 Yard Butterfly
16	DU Fayet DE LA Tour, Vin	14	NBA-PC	2:50.00	2:35.25	Boys 13-14 200 Yard IM

Compare that to the much simpler approach available in swimmeR version 0.7.2. – no need for typo & replacement, and no need to after-parse fixes to Vin’s name.

df_new <-
  swim_parse(
    read_results(
      "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
    )
  )

df_new %>%
  filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
  select(-Points,-DQ,-Exhibition) %>%
  flextable_style()

Place	Name	Age	Team	Prelims_Time	Finals_Time	Event
13	DU Fayet DE LA Tour, Vin	14	NBA-PC	1:02.09	1:00.08	Boys 13-14 100 Yard Freestyle
12	DU Fayet DE LA Tour, Vin	14	NBA-PC	1:20.75	1:09.03	Boys 13-14 100 Yard Backstroke
14	DU Fayet DE LA Tour, Vin	14	NBA-PC	1:16.16	1:10.01	Boys 13-14 100 Yard Butterfly
16	DU Fayet DE LA Tour, Vin	14	NBA-PC	2:50.00	2:35.25	Boys 13-14 200 Yard IM

This is not a promise that there will be no need for typo and replacement. Sometimes there really are typos that need replacing. Things should be easier now though.

Second – event names were also an issue in older versions of SwimmeR. If swim_parse didn’t find any event names it liked it would throw an error and return nothing. Now, in swimmeR version 0.7.2 the event name definitions are much broader, and failing to find any event names will not result in an error.

These results, from the 2019 Australian Nationals won’t read in previous version of SwimmeR because the events are named with “Metre” rather than “Meter”. Now though, with SwimmeR version 0.7.2 we can see the Campbell sisters doing their thing.

df_aus <-
  swim_parse(
    read_results(
      "https://www.swimming.org.au/sites/default/files/assets/documents/full%20results_0.pdf"
    )
  )

df_aus %>%
  head(2) %>%
  flextable_style()

Place	Name	Age	Team	Prelims_Time	Finals_Time	Points	DQ	Exhibition	Event
1	CAMPBELL, CATE	27	KNOX PYMBLE	24.33	24.05	953	0	0	Women 50 LC Metre Freestyle
2	CAMPBELL, BRONTE	25	KNOX PYMBLE	24.60	24.17	939	0	0	Women 50 LC Metre Freestyle

Modifications to swim_parse to begin to handle older style Hy-Tek results, like these from 2002. Issues with inconstant treatment of splits within the results themselves remain, so let the user beware. These older results are still an active area of development.

df_2002 <-
  swim_parse(
    read_results(
      "https://cdn.swimswam.com/wp-content/uploads/2018/08/2002-Division-I-NCAA-Championships-Men-results1.pdf"
    )
  )

df_2002 %>%
  filter(str_detect(Event, "100 Yard BUTTERFLY")) %>%
  head(3) %>%
  flextable_style()

Place	Name	Age	Team	Prelims_Time	Finals_Time	Points	DQ	Exhibition	Event
1	CROCKER, IAN	SO	TEXAS	45.70	45.44	NA	0	0	Event 9 MEN’s 100 Yard BUTTERFLY
2	MARSHALL, PETER	SO	STANFORD	46.39	46.48	NA	0	0	Event 9 MEN’s 100 Yard BUTTERFLY
3	SCHOEMAN, ROLAND	SR	ARIZONA	46.57	46.50	NA	0	0	Event 9 MEN’s 100 Yard BUTTERFLY

Bug fixes, always bug fixes.

In Closing

Please do download the newest version of SwimmeR from wherever you get your packages. You’re also welcome to submit bug reports or feature requests on the SwimmeR project github page.

To leave a comment for the author, please follow the link and comment on their blog: Welcome to Swimming + Data Science on Swimming + Data Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.