Introducing JumpeR – For Track and Field Data

[This article was first published on Welcome to Swimming + Data Science on Swimming + Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Ordinarily posts on Swimming + Data Science have focused on swimming, or sometimes diving. Today though we’re going to visit some of our more gravity-afflicted colleagues and do a bit of cross-training. That’s because following what I’m going to call the SwimmeR package’s massive success literally several people reached out to me regarding developing a similar package for track and field. That package, called JumpeR, is now available on CRAN.

You can get your very own copy of this cutting edge sports-data-science package, for free, today!

install.packages("JumpeR")
library(JumpeR)
library(flextable)
library(dplyr)
library(ggplot2)

flextable_style <- function(x) {
  x %>%
    flextable() %>%
    bold(part = "header") %>% # bolds header
    bg(bg = "#D3D3D3", part = "header") %>%  # puts gray background behind the header row
    autofit()
}

What does JumpeR do?

JumpeR is very similar to SwimmeR. They both mostly serve to convert results from human readable documents to machine & human readable data frames in the context of the R programming environment.

Supported Results Format

JumpeR currently supports single column Hy-Tek results, like these, and Flash Results .pdf files like these. JumpeR does not support multi-column Hy-Tek results or Flash .html files. Further details are available in the package readme file.

Examples

A Running Race

Here’s an example, reading in the 2019 Ivy League Championships and looking at the finals of the Women’s 200M Dash

df <- tf_parse(
  read_results("http://www.leonetiming.com/2020/Indoor/IvyLeague/Results.htm")
  )

df %>% 
  filter(Event == "Women 200 Meter Dash") %>% 
  group_by(Name, Team) %>% # to remove prelims
  slice(2) %>% # to remove prelims
  arrange(Place) %>% # arrange by Place
  flextable_style()

Place

Name

Age

Team

Finals_Result

Tiebreaker

DQ

Event

1

Katina Martin

SO

Harvard

24.05

0

Women 200 Meter Dash

2

Olivia Okoli

JR

Harvard

24.44

0

Women 200 Meter Dash

3

Cecil Ene

SR

Penn

24.52

24.511

0

Women 200 Meter Dash

4

Elena Brown-Soler

SR

Penn

24.52

24.520

0

Women 200 Meter Dash

5

Katie DiFrancesco

JR

Princeton

24.53

0

Women 200 Meter Dash

6

Libby McMahon

SO

Yale

25.12

0

Women 200 Meter Dash

7

Isabella Hilditch

SO

Princeton

40.06

0

Women 200 Meter Dash

Kennedy Waite

FR

Brown

DNF

1

Women 200 Meter Dash

Discus, with Flights

But wait, there’s more! Field events, like jumping and throwing, allow athletes to try several times, with each try called a “flight”. Flights can be captured as well. Here’s the Men’s Discus from the 2019 Virginia Grand Prix

df <- tf_parse(
  read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-27_VirginiaGrandPrix/038-1.pdf"),
  flights = TRUE
  )

df %>% 
  flextable_style()

Place

Name

Age

Team

Finals_Result

DQ

Event

Flight_1

Flight_2

Flight_3

Flight_4

Flight_5

Flight_6

1

Nicholas EDWARDS

FR

HAMPTON

49.86m

0

Men Discus

X

47.11

45.99

47.28

X

49.86

2

Michael ALBERT

JR

APP STATE

48.30m

0

Men Discus

48.30

47.16

44.96

X

45.85

X

3

Joshua HUNTER

SO

HAMPTON

47.43m

0

Men Discus

31.94

X

46.54

X

47.43

X

4

Peter KENN

SR

APP STATE

46.14m

0

Men Discus

X

42.83

46.14

44.26

43.80

44.66

5

Asher PRINCE

FR

CHARLOTTE

45.98m

0

Men Discus

X

45.98

44.62

X

X

X

6

Sasha DAJIA

SR

CHARLOTTE

44.40m

0

Men Discus

X

44.40

44.19

X

44.08

42.04

7

Britton MANN

SR

HIGH POINT

42.07m

0

Men Discus

X

38.31

X

40.49

X

42.07

8

Gabriel STAINBACK

SO

HIGH POINT

39.37m

0

Men Discus

38.53

36.94

39.37

FOUL

Kysheen MYRICK

SO

LIBERTY

FOUL

1

Men Discus

X

X

X

FOUL

Tyson JONES

FR

VIRGINIA TECH

FOUL

1

Men Discus

X

X

X

Pole Vault, with Flights and Attempts

JumpeR can even capture attempts for vertical jumping events, like in these Women’s Pole Vault results from the 2019 Texas A&M Invite. These results do get quite wide, so here they’re cut off at Flight 2.

df <- tf_parse(
  read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-12_TamuInvite/014-1.pdf"),
  flights = TRUE,
  flight_attempts = TRUE
  )

df %>% 
  select(Place:Flight_2_Attempts) %>% 
  flextable_style()

Place

Name

Age

Team

Finals_Result

DQ

Event

Flight_1

Flight_1_Attempts

Flight_2

Flight_2_Attempts

1

Caroline BELLOWS

SR

UTSA

3.88m

0

Women Pole Vault

3.28

3.43

O

2

Myka STEINBEISSER

FR

ARIZONA STATE

3.73m

0

Women Pole Vault

3.28

3.43

O

3

Tommi HINTNAUS

SO

ARIZONA STATE

3.73m

0

Women Pole Vault

3.28

3.43

4

Erika WILLIS

FR

AIR FORCE

3.58m

0

Women Pole Vault

3.28

3.43

O

5

Kylie SWIEKATOWSKI

JR

RICE

3.58m

0

Women Pole Vault

3.28

3.43

XO

6

Cameron BOEDEKER

JR

SAM HOUSTON ST.

3.58m

0

Women Pole Vault

3.28

3.43

O

6

Kendahl SHUE

JR

TCU

3.58m

0

Women Pole Vault

3.28

3.43

8

Corey FRIEDENBACH

FR

AIR FORCE

3.58m

0

Women Pole Vault

3.28

3.43

O

9

Tysen TOWNSEND

FR

TCU

3.58m

0

Women Pole Vault

3.28

3.43

XXO

10

Lauren LABAY

JR

SAM HOUSTON ST.

3.43m

0

Women Pole Vault

3.28

3.43

O

10

Margaret LASSALLE

SR

SAM HOUSTON ST.

3.43m

0

Women Pole Vault

3.28

3.43

O

12

Emily HARRISON

FR

RICE

3.43m

0

Women Pole Vault

3.28

3.43

XXO

12

Frankie PORAMBO

FR

AIR FORCE

3.43m

0

Women Pole Vault

3.28

O

3.43

XXO

DNS

Alexandria GRAY

FR

UTSA

DNS

0

Women Pole Vault

3.28

3.43

NH

Hannah SEARBY

SO

TEXAS A&M

NH

1

Women Pole Vault

3.28

3.43

XXX

NH

Jerni SELF

SR

AIR FORCE

NH

1

Women Pole Vault

3.28

3.43

NH

Kathryn TOMCZAK

SR

AIR FORCE

NH

1

Women Pole Vault

3.28

3.43

Pole Vault Long Format

These results do get quite wide, but don’t worry. Switching to longer is easy as with JumpeR::attempts_split_long.

df <- tf_parse(
  read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-12_TamuInvite/014-1.pdf"),
  flights = TRUE,
  flight_attempts = TRUE
  )


df %>% 
  attempts_split_long() %>% 
  filter(Place == 1) %>% # only first place athlete
  select(Place, Name, Age, Team, Finals_Result, Event, Bar_Height, Attempt, Result) %>% 
  flextable_style()

Place

Name

Age

Team

Finals_Result

Event

Bar_Height

Attempt

Result

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

3.28

1

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

3.28

2

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

3.28

3

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

3.43

1

O

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

3.58

1

X

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

3.58

2

O

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

3.73

1

X

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

3.73

2

O

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

3.88

1

X

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

3.88

2

O

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

4.03

1

X

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

4.03

2

X

1

Caroline BELLOWS

SR

UTSA

3.88m

Women Pole Vault

4.03

3

X

Relay Athletes

Going back to those Ivy League results, we can pull out the names relay athletes for each relay.

df <- tf_parse(
  read_results("http://www.leonetiming.com/2020/Indoor/IvyLeague/Results.htm"),
  relay_athletes = TRUE
  )

df %>% 
  filter(Event == "Men 4x400 Meter Relay") %>% 
  select(-Tiebreaker, -Name) %>% 
  flextable_style()

Place

Age

Team

Finals_Result

DQ

Event

Relay_Athlete_1

Relay_Athlete_2

Relay_Athlete_3

Relay_Athlete_4

1

Harvard

3:13.85

0

Men 4×400 Meter Relay

Aaron Shirley

Gregory Lapit

Charles Lego

Jovahn Williamson

2

Penn

3:15.55

0

Men 4×400 Meter Relay

Robbie Ruppel

Anthony Okolo

Emerson Douds

Antaures Jackson

3

Yale

3:16.60

0

Men 4×400 Meter Relay

Christopher Colbert

Juma Sei

Phil Zuccaro

Marcus Woods

4

Cornell

3:17.61

0

Men 4×400 Meter Relay

Christian Martin

Myles Solan

Malick Diomande

Tien Henderson

5

Dartmouth

3:17.66

0

Men 4×400 Meter Relay

Mathieu Farber

Charlie Wade

Julian Martelly

Max Frye

6

Columbia

3:19.42

0

Men 4×400 Meter Relay

Chris Balthazar

Jahi Hernandez

Brodie Holmes

Vasilis Kopanas

7

Princeton

3:20.61

0

Men 4×400 Meter Relay

Gregory Sholars

Klaudio Gjetja

Anderson Dimon

Michael Phillippy

8

Brown

3:25.72

0

Men 4×400 Meter Relay

Sergey Gorban

Austin Reynolds

Kevin Boyce

Tim McDonough

Formating Results

Track and field results are of two forms. Times, as “MM:SS.HH”, and lengths/heights, often as “X.XXm”. JumpeR has math_format for converting these result strings into numerics, which is useful when doing comparisons and plotting. Here’s the men’s pole vault at the USA T&F 2019 Championships .

df <- tf_parse(
  read_results("https://www.flashresults.com/2019_Meets/Outdoor/07-25_USATF_CIS/026-1.pdf"))


df %>%
  mutate(Finals_Math = math_format(Finals_Result)) %>% # results to numerics
  mutate(Name = factor(Name, unique(Name))) %>% # order names by order of finish
  ggplot(aes(x = Name, y = Finals_Math)) +
  geom_col() +
  theme_bw() +
  theme(axis.text.x = element_text(
    angle = 90,
    vjust = 0.5,
    hjust = 1
  )) +
  labs(y = "Height Cleared (m)",
       title = "USA Pole Vault Championships")

One can use math_format on mixed format lists too. Times will be converted to seconds, meters will remain in meters, and standard units (feet, inches) will be converted to inches. Units however are not included, so be aware.

demo_list <- c(
  "1.23m", # a height/length in meters, output in meters
  "5-06.45", # a height/length in standard, output in inches
  "10:34.34", # a time with minutes, output in seconds
  "9.45" # a time without minutes, output in seconds
)

math_format(demo_list)
## [1]   1.23  66.45 634.34   9.45

JumpeR Going Forward

I plan to maintain JumpeR, fix bugs, and respond to feature requests as I’m able. Another useful improvement would be increasing the number/types of supported results. More contributors are certainly welcome. If you’d like to be involved get in touch, or visit the project repo on github.

To leave a comment for the author, please follow the link and comment on their blog: Welcome to Swimming + Data Science on Swimming + Data Science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)