Form and File: estimating running form in R

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are lots of ways for runners and cyclists to analyse training data. A key question most fitness enthusiasts want to know is “how am I doing?”.

“How you are doing” is referred to as form.

Unsurprisingly, form can be estimated in many ways. One method is using training stress scores (acute training load and chronic training load) to assess form as training stress balance. The acronyms for these terms are apparently copyrighted(!) by TrainingPeaks. So I will refer to acute training load as fatigue, chronic training load as fitness and the training stress balance as form. Some notes on how these are calculated can be found lower down.

Let’s calculate these scores for running using R.

The plots above show my scores for this year so far. Because of the way that these scores are calculated it takes 7 days to get a meaningful Fatigue score and 42 days for a meaningful Fitness score, i.e. the calculation starts at 0 on New Year’s Day, when in reality, I carried over fitness and fatigue from December. Nonetheless this is a good way of tracking Form.

So what does it tell us? This year I spent a lot of time in the Grey zone, only nudging into the Optimal zone during intense activity. This is because my basal activity (and therefore fitness) is quite high. This means I need to do periodisation if I want to target improvement. This is where the runner interleaves intense periods (blocks) with less active spells.

How to calculate the data

Using summary data from Garmin connect (downloadable as CSV), a runner’s average heart rate and run time for a given activity date is all that is needed to do the calculation.

## The aim of this script is to load and process CSV data from the Garmin Connect website.
## This script will load all csv files in Data/ (in current wd) and filter for Running (and Treadmill Running)
## Place one or more Garmin CSV outputs into the Data folder for inclusion. Dates for activities can be overlapping
## duplicates are dealt with, so you can just keep adding csvs with the latest data and use the script again.
## Use of `find_form(from, to)` enables the user to examine their running form within the specified window.


## Setup preferred directory structure in wd
ifelse(!dir.exists("Data"), dir.create("Data"), "Folder exists already")
ifelse(!dir.exists("Output"), dir.create("Output"), "Folder exists already")
ifelse(!dir.exists("Output/Data"), dir.create("Output/Data"), "Folder exists already")
ifelse(!dir.exists("Output/Plots"), dir.create("Output/Plots"), "Folder exists already")
ifelse(!dir.exists("Script"), dir.create("Script"), "Folder exists already")

## functions

getWindowActivities <- function(activity,fromStr,toStr,df) {
  # filter for activity
  df_window <- subset(df,grepl(tolower(activity),tolower(df$Activity.Type)))
  # activities within the window
  fromDate <- as.Date(fromStr)
  toDate <- as.Date(toStr)
  df_window <- subset(df_window, as.Date(df_window$Date) >= fromDate & as.Date(df_window$Date) <= toDate)
  # put them in order
  df_window <- df_window[order(as.numeric(df_window$Date)),]

makeDateDF <- function(fromStr,toStr) {
  temp <- seq(as.Date(fromStr), as.Date(toStr), by="days")
  df <- data.frame(Date = temp,
                   ATL = rep(0,length(temp)),
                   CTL = rep(0,length(temp)))

process_load <- function(activityStr,fromStr,toStr) {
  all_files <- list.files("Data", pattern = "*.csv", full.names = TRUE)
  df_all <- read.csv(all_files[1], header = TRUE, stringsAsFactors=FALSE)
  df_all <- subset(df_all, select = c(Activity.Type,Date,Title,Distance,Time,Avg.HR))
  for (filename in all_files[-1]) {
    df_temp <- read.csv(filename, stringsAsFactors=FALSE)
    # subset data because Garmin can add or remove columns and we don't need them all
    df_temp <- subset(df_temp, select = c(Activity.Type,Date,Title,Distance,Time,Avg.HR))
    df_all <- rbind(df_all, df_temp)
  # remove duplicates
  df_all <- df_all[!duplicated(df_all), ]
  # format Date column to POSIXct
  df_all$Date <- as.POSIXct(strptime(df_all$Date, format = "%Y-%m-%d %H:%M:%S"))
  # convert average HR to numeric
  df_all$Avg.HR <- as.numeric(df_all$Avg.HR)
  # replace NA with average of Avg.HR
  # retrieve the activities that match activity type in the time window of interest
  df_all <- getWindowActivities(activityStr,fromStr,toStr,df_all)
  # add a column that contains the load of each activity
  # one way to calculate load is to multiply time in hours by avg HR and add 2.5 times avg HR
  # this relates to load by y = ax + b of a = 0.418, b = -150
  df_all$load <- 0.418 * ((as.numeric(lubridate::hms(df_all$Time)) / 3600 * df_all$Avg.HR) + (2.5 * df_all$Avg.HR)) - 150

sumDays <- function(df,daydf) {
  df$Date <- as.Date(df$Date)
  tempdf <- aggregate(load ~ Date, data = df, sum)
  newdf <- merge(daydf, tempdf, all.x = TRUE)
  newdf[] = 0

calculateTL <- function(df) {
  for (i in 1:nrow(df)) {
    # add today's load to training load(s)
    df$ATL[i] <- df$ATL[i] + df$load[i]
    df$CTL[i] <- df$CTL[i] + df$load[i]
    for (j in (i + 1) : (i + 42)) {
      if(j > nrow(df)) {
      df$ATL[j] <- df$ATL[i] * exp(-(j-i)/7)
      df$CTL[j] <- df$CTL[i] * exp(-(j-i)/42)
  df <- df[,1:3]
  df[2] <- df[2] / 7
  df[3] <- df[3] / 42
  df$TSS <- df$CTL - df$ATL

# run the analysis
find_form <- function(from, to) {
  # load data in and calculate load for each activity
  mydata <- process_load("running",from,to)
  # make a data frame that has every day in our time window represented
  tl <- makeDateDF(from,to)
  # sum the load for each day
  df <- sumDays(mydata,tl)
  # calculate training loads
  df <- calculateTL(df)
  # data frame for Form zones
  rects <- data.frame(ystart = c(20,5,-10,-30,-50),
                      yend = c(30,20,5,-10,-30),
                      xstart = rep(as.Date(from), 5),
                      xend = rep(as.Date(to), 5),
                      col = factor(c("Transition", "Fresh", "Grey zone", "Optimal", "High risk"), levels = c("Transition", "Fresh", "Grey zone", "Optimal", "High risk")))
  # first plot = Fitness and Fatigure
  p1 <- ggplot(df, aes(x = Date)) +
    geom_area(aes(y = CTL), fill = "#58abdf", alpha = 0.2) +
    geom_line(aes(y = CTL), colour = "#58abdf") +
    geom_line(aes(y = ATL), colour = "#5e3cc4") +
    geom_text(aes(x = as.Date(to), y = 0, vjust = "inward", hjust = "inward", label = "Fitness"), color = "#58abdf") + 
    geom_text(aes(x = as.Date(from), y = max(ATL), vjust = "inward", hjust = "inward", label = "Fatigue"), color = "#5e3cc4") +
    labs(x = "", y = "Training load per day") +
    theme_bw() +
  # second plot = Form
  p2 <- ggplot(df, aes(x = Date, y = TSS)) +
    geom_line(colour = "#0a0a0a", ) +
    geom_rect(data = rects, inherit.aes = F, aes(xmin = xstart, xmax = xend, ymin = ystart, ymax = yend, fill = col), alpha = 0.2) +
    scale_fill_manual(values = c("#DDB140", "#58ABDF", "#A3A3A3", "#67C75D", "#CB2A1D")) +
    labs(x = "", y = "Form") +
    theme_bw() +
  # patchwork assembly
  p3 <- p1 / p2
  # save plots
  ggsave(paste0("Output/Plots/tss_",from,"_",to,".png"), plot = p3, width = 8, height = 4, dpi = "print")

# do the analysis

I have functionalised the code to make it easier to understand how it works. Briefly, the data are read in, cleaned slightly and then “load” is calculated. We only look within a certain window for the plots, so we subset the data and then calculate the total load for every day within this time window. The stress scores are then calculated and the plots are generated using ggplot and patchwork.

Some notes on the stress scores

Although the stress score acronyms are copyrighted, what they do is not too mysterious. Fatigue is how tired you are feeling that week and Fitness is how much training you’ve done over six weeks. Put another way, Fatigue is an exponentially weighted average of load over 7 days while Fitness is an exponentially weight average of load over 42 days. Form is the difference between Fatigue and Fitness. There’s probably a function to do exponentially weighted averaging in R, but I just wrote something quick to do it in the function calculateTL().

So how do we calculate load? We simply need a measure of how stressful the activity was. We cannot take distance or time (because it doesn’t really tell us how hard we ran). Speed would be better but again terrain could be hilly or flat… If we were measuring cycling performance and we had power measurements, this would be ideal. Instead for running we can use heart rate data (as long as we have it for all activities).

This is how the training stress scores look in I used their estimation of Load to back-calculate and apply a similar metric to my data in R.

In short, average heart rate for an activity multiplied by duration of activity gets us very close to an approximation of load. This makes perfect sense: if you have run for 30 minutes at a given heart rate and then another day run for 1 h at the same average heart rate, it should be twice as much load. However, it wasn’t completely linear and there were some outliers, i.e. particularly hard or easy runs. So a correction was needed to account for this, and voila, I had something approximating the “load” metric used by Be aware that if you are re-running my code with your own data, your values may need tweaking.

The load calculation could be done in a more sophisticated way, by breaking down the activity into periods in each heart rate zone. However, we only need a big picture view here and the approximation done here serves the purpose.

The post title comes from “Form and File” by Archers of Load from their album “All The Nations Airports”.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)