Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
SwimmeR version 0.7.2 is now available from CRAN. This new version contains some new features, plus a few changes to make it more user-friendly. Let me show you what I’ve been working on.
library(SwimmeR)
library(dplyr)
library(stringr)
library(flextable)
library(rbenchmark)
flextable_style <- function(x) {
x %>%
flextable() %>%
bold(part = "header") %>% # bold header
bg(bg = "#D3D3D3", part = "header") %>% # puts gray background behind the header row
align_nottext_col(align = "center", header = TRUE, footer = TRUE) %>% # center alignment
autofit()
}
New Features
- SwimmeR can now parse S.A.M.M.S. style results. S.A.M.M.S., which stands for Swimclub And Meet Management System, was an ahem swim club and meet management system that predated Hy-Tek’s Meet and Team Manager. It seems to have been most popular in California, where it’s still used by USA Swimming clubs and high schools into the present day.
S.A.M.M.S. meets look like this:
Parsing them is a simple matter for you SwimmeR users – it’s exactly the same as parsing Hy-Tek style results. The only differences come in with respect to relay_swimmers and splits. Same read_results, same swim_parse. S.A.M.M.S. results that I’ve seen don’t include relay swimmers, so of course SwimmeR doesn’t collect them. Splits are also rarely seen in S.A.M.M.S. results and at this moment are also not collected by SwimmeR, although they may be in a future release.
df <-
swim_parse(
read_results(
"http://www.pacswim.org/userfiles/meets/documents/1629/1119bac.htm"
)
)
df %>%
head(5) %>%
flextable_style()
Place | Name | Age | Team | Finals_Time | DQ | Event |
1 | LADOMIRAK, ALEGRIA | 8 | PC PALO ALTO STANFORD | 16.11 | 0 | EVENT 73 FEMALE 8&UN 25 FREE |
2 | DIEHN, EVA | 8 | PC BULL DOG SWIM CLUB | 16.36 | 0 | EVENT 73 FEMALE 8&UN 25 FREE |
3 | HILL, NAOMI | 8 | PC PALO ALTO STANFORD | 16.50 | 0 | EVENT 73 FEMALE 8&UN 25 FREE |
4 | HOUTZER, AMELIA | 8 | PC PALO ALTO STANFORD | 16.88 | 0 | EVENT 73 FEMALE 8&UN 25 FREE |
5 | CHANG, KAYLA | 8 | PC BURLINGAME AQUATIC | 17.55 | 0 | EVENT 73 FEMALE 8&UN 25 FREE |
On a personal level working with these S.A.M.M.S. results was very encouraging, because they have all kinds of weird bugs and cut corners that make me feel better about SwimmeR. For example some S.A.M.M.S. results list a place order for finals swims, as “F1”, “F2”, etc. But S.A.M.M.S. can’t handle more than two characters in that field, so if someone comes in 10th they just get “F”.
Just “F”
S.A.M.M.S. also doesn’t know what to make of diving, and records diving results like swimming results, so “347.56” is written as “3:47.56” (swim_parse corrects this). S.A.M.M.S. also orders diving results backwards with the lowest (i.e. fastest) score/time listed first.
Maybe divers don’t mind being upside down?
S.A.M.M.S. was a commercial product. SwimmeR might have its issues sometimes, but at least it’s free!
- Under the hood changes to speed up
swim_parse. We can test this withbenchmarkfrom therbenchmarkpackage because I’ve left the oldswim_parsefunction inSwimmeR, renamedswim_parse_old. It’s not exported though so to actually access it you’ll need to call it asSwimmeR:::swim_parse_old.
benchmark("new" = {
swim_parse(
read_results(
"http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
)
)
},
"old" = {
SwimmeR:::swim_parse_old(
read_results(
"http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
)
)
},
replications = 5) %>%
flextable_style()
test | replications | elapsed | relative | user.self | sys.self |
new | 5 | 33.62 | 1.000 | 30.67 | 0.06 |
old | 5 | 76.72 | 2.282 | 74.16 | 0.09 |
relative column above, the new version of swim_parse is a little over twice as fast as the old version (on my computer at least). You’re all very welcome.
- Kinder and gentler all around. There have been several changes to make
swim_parsemore user friendly. First is decreased reliance on thetypoandreplacementarguments. They’re still present, and still work, but they’re hopefully now much less necessary.
By way of example in this meet there’s a young man named “DU Fayet DE LA Tour, Vin”, as seen here:
swim_parse_old struggles with this, and gets his name wrong.
df_old <-
SwimmeR:::swim_parse_old(
read_results(
"http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
)
)
df_old %>%
filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
select(-Points,-DQ,-Exhibition) %>%
flextable_style()
Place | Name | Age | Team | Prelims_Time | Finals_Time | Event |
13 | DU Fayet DE LA Tour | 14 | NBA-PC | 1:02.09 | 1:00.08 | Boys 13-14 100 Yard Freestyle |
12 | DU Fayet DE LA Tour | 14 | NBA-PC | 1:20.75 | 1:09.03 | Boys 13-14 100 Yard Backstroke |
14 | DU Fayet DE LA Tour | 14 | NBA-PC | 1:16.16 | 1:10.01 | Boys 13-14 100 Yard Butterfly |
16 | DU Fayet DE LA Tour | 14 | NBA-PC | 2:50.00 | 2:35.25 | Boys 13-14 200 Yard IM |
We can fix the problem in a hacky, and non-intuitive kind of way using typo and replacement, plus some after the parse changes. It works, but it’s not terribly easy.
df_old_tr <-
SwimmeR:::swim_parse_old(
read_results(
"http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
),
typo = ", Vin ",
replacement = " Vin "
) %>%
mutate(Name = str_replace(Name, " Vin", ", Vin"))
df_old_tr %>%
filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
select(-Points, -DQ, -Exhibition) %>%
flextable_style()
Place | Name | Age | Team | Prelims_Time | Finals_Time | Event |
13 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:02.09 | 1:00.08 | Boys 13-14 100 Yard Freestyle |
12 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:20.75 | 1:09.03 | Boys 13-14 100 Yard Backstroke |
14 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:16.16 | 1:10.01 | Boys 13-14 100 Yard Butterfly |
16 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 2:50.00 | 2:35.25 | Boys 13-14 200 Yard IM |
Compare that to the much simpler approach available in swimmeR version 0.7.2. – no need for typo & replacement, and no need to after-parse fixes to Vin’s name.
df_new <-
swim_parse(
read_results(
"http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm"
)
)
df_new %>%
filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>%
select(-Points,-DQ,-Exhibition) %>%
flextable_style()
Place | Name | Age | Team | Prelims_Time | Finals_Time | Event |
13 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:02.09 | 1:00.08 | Boys 13-14 100 Yard Freestyle |
12 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:20.75 | 1:09.03 | Boys 13-14 100 Yard Backstroke |
14 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:16.16 | 1:10.01 | Boys 13-14 100 Yard Butterfly |
16 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 2:50.00 | 2:35.25 | Boys 13-14 200 Yard IM |
typo and replacement. Sometimes there really are typos that need replacing. Things should be easier now though.
Second – event names were also an issue in older versions of SwimmeR. If swim_parse didn’t find any event names it liked it would throw an error and return nothing. Now, in swimmeR version 0.7.2 the event name definitions are much broader, and failing to find any event names will not result in an error.
These results, from the 2019 Australian Nationals won’t read in previous version of SwimmeR because the events are named with “Metre” rather than “Meter”. Now though, with SwimmeR version 0.7.2 we can see the Campbell sisters doing their thing.
df_aus <-
swim_parse(
read_results(
"https://www.swimming.org.au/sites/default/files/assets/documents/full%20results_0.pdf"
)
)
df_aus %>%
head(2) %>%
flextable_style()
Place | Name | Age | Team | Prelims_Time | Finals_Time | Points | DQ | Exhibition | Event |
1 | CAMPBELL, CATE | 27 | KNOX PYMBLE | 24.33 | 24.05 | 953 | 0 | 0 | Women 50 LC Metre Freestyle |
2 | CAMPBELL, BRONTE | 25 | KNOX PYMBLE | 24.60 | 24.17 | 939 | 0 | 0 | Women 50 LC Metre Freestyle |
- Modifications to
swim_parseto begin to handle older style Hy-Tek results, like these from 2002. Issues with inconstant treatment of splits within the results themselves remain, so let the user beware. These older results are still an active area of development.
df_2002 <-
swim_parse(
read_results(
"https://cdn.swimswam.com/wp-content/uploads/2018/08/2002-Division-I-NCAA-Championships-Men-results1.pdf"
)
)
df_2002 %>%
filter(str_detect(Event, "100 Yard BUTTERFLY")) %>%
head(3) %>%
flextable_style()
Place | Name | Age | Team | Prelims_Time | Finals_Time | Points | DQ | Exhibition | Event |
1 | CROCKER, IAN | SO | TEXAS | 45.70 | 45.44 | NA | 0 | 0 | Event 9 MEN’s 100 Yard BUTTERFLY |
2 | MARSHALL, PETER | SO | STANFORD | 46.39 | 46.48 | NA | 0 | 0 | Event 9 MEN’s 100 Yard BUTTERFLY |
3 | SCHOEMAN, ROLAND | SR | ARIZONA | 46.57 | 46.50 | NA | 0 | 0 | Event 9 MEN’s 100 Yard BUTTERFLY |
- Bug fixes, always bug fixes.
In Closing
Please do download the newest version of SwimmeR from wherever you get your packages. You’re also welcome to submit bug reports or feature requests on the SwimmeR project github page.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
