Revisited: Animating 2020 running goals – and compare them with previous years

[This article was first published on Rcrastinate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Picking up the pieces

Recently, I’ve animated a specific running goal for one year with data from the Strava API. I thought it would also be nice to compare different years within the same animation – in a way you could say that you’re racing yourself over several years.

This only needs some minor modifications to the code I posted before. Data collection stays basically the same, so I’m just re-posting this here.

# Get the data from the API
app_name <- '<your app name>'
app_client_id  <- '<your app client ID>'
app_secret <- '<your app secret>'

# create the authentication token
stoken <- httr::config(token = strava_oauth(app_name, app_client_id, app_secret, app_scope="activity:read_all"))
act.list <- get_activity_list(stoken)
act.df <- compile_activities(act.list)
act.df <- act.df[,c("start_date_local", "distance", "type")]

Alright, we’re all set for the modifications to the code now. Remember that you can also plug in your own dataframe as long as it fits the structure of the API-generated one.

Code modifications

Let’s say we still want to run 500 km in one year, but now we want to compare several years - let’s choose the last 3 years:

goal <- 500
sports <- "Run"
selected.year <- c(2018, 2019, 2020)

We are subsetting and aggregating the data. Just as a reminder: We have to aggregate in case we have several activities at one day (like one run split in two activities).

select.df <- act.df[act.df$type %in% sports &
                      year(act.df$start_date_local) %in% selected.year,]

select.df$date <- as.Date(date(select.df$start_date_local))
select.df <- aggregate(distance ~ date, data = select.df, FUN = sum)
select.df <- select.df[order(select.df$date),]

The dataframe with every date (not only the ones with recorded activities) is still called year.df. We have to make sure that this is done for each year in the dataset. I’m using an sapply() for this.

year.df <- data.frame(
  date = unlist(sapply(unique(year(select.df$date)), USE.NAMES = F, FUN = function (yr) {
    as.character(seq(as.Date(paste0(yr, "-01-01")),
        as.Date(paste0(yr, "-12-31")),
        by = "day"))
year.df$date <- as.Date(year.df$date)

Now, we are

  • integrating the data from select.df into year.df with a call to merge()
  • ordering by date,
  • replacing NAs with zeros in the distance column
  • creating a column with the day of year (with the yday() function from {lubridate}) because this will be our new x-axis variable. We have to do this because we want to see all years in the dataset develop “next to each other”. The day of year is the same for all years (with a minor exception: in leap years, days will be shifted by 1 starting from March 1st - this won’t be visible in the plot though)
  • creating a column with the year which we will use later for the col and group aesthetic of the plot.
year.df <- merge(year.df, select.df, by = "date", all.x = T, all.y = F)
year.df <- year.df[order(year.df$date),]
year.df$distance <- ifelse($distance), 0, year.df$distance)
year.df$day <- yday(year.df$date)
year.df$year <- year(year.df$date)

The next call makes use of some {dplyr} functions and piping. Although I have been sceptical towards the “tidyverse” in the beginning, I am now using some functionality whenever it has huge advantages compared to base R functionality. The following code chunk shows such an advantage, I think.

  • We’re grouping year.df by year.
  • We are cumulating the distances within each group (because cumulated distance has to start with 0 at the beginning of the year)
  • We are ungrouping and converting the tibble back to dataframe (because otherwise it would break later if we would keep the tibble).
year.df %>% group_by(year) %>%
  mutate(cum.distance = round(cumsum(distance))) %>%
  ungroup() %>% -> year.df

Now, we are finally computing all the other statistics used in the plot (see the previous post for further explanations). I am keeping all the statistics although currently not all of them are being used. I got rid of the large label in the lower right corner because I don’t have a good idea to still keep it simple and short with 3 years.

year.df$remaining.distance <- goal - year.df$cum.distance
year.df$days.till.end <- as.numeric(as.Date(paste0(year.df$year, "-12-31")) -

year.df$ <- year.df$remaining.distance / year.df$days.till.end
year.df$ <- year.df$ * 7

year.df$ <- round(year.df$, 1)
year.df$ <- round(year.df$, 1)

year.df[year.df$ < 0, ""] <- 0
year.df[year.df$ < 0, ""] <- 0
year.df[year.df$remaining.distance < 0, "remaining.distance"] <- 0

year.df[year.df$days.till.end == 0, ""] <- "-"
year.df[year.df$days.till.end == 0, ""] <- "-" <- yday(Sys.Date())

Plotting and animating

The only important change here is that the x-axis is defined by the day of the year now. I am including major and minor breaks at the beginning of the months. The x-axis labels are given for every second month.

p <- ggplot(year.df, aes(x = day, y = cum.distance, col = factor(year), group = year)) +
  geom_line() +
  geom_hline(yintercept = goal, col = "red", lwd = 1) +
  geom_vline(xintercept =, col = alpha("grey", .5), lty = "dashed") +
  geom_segment(x = 1, y = 0, xend = 365, yend = goal, lty = "dashed", col = "red") +
  geom_segment(aes(x = day, y = cum.distance, col = factor(year), group = year),
               xend = 365,
               yend = goal, lty = "dotted") +
  geom_point(size = 2) +
  geom_text(aes(day + 7,
                label = paste0(year, ": ", round(cum.distance))),
            hjust = 0, size = 4) +
  coord_cartesian(clip = 'off') +
  scale_x_continuous(breaks = c(32, 91, 152, 213, 274, 335),
                     labels = c("Feb", "Apr", "Jun", "Aug", "Oct", "Dec"),
                     minor_breaks = c(1, 60, 121, 182, 244, 305, 365)) +
  transition_reveal(day) +
  guides(col = F) +
  labs(x = "Day of year", y = "km completed") +
  theme_minimal() + theme(plot.margin = margin(5.5, 60, 5.5, 5.5))

Aaaaand… animating the whole thing!

animate(p, fps = 60, duration = 30, width = 1440, height = 900, res = 200, end_pause = 25,
        renderer = av_renderer(file = "runs-3yr.mp4"))

Now, if you let this play out, there are several interesting things to note:

  • I’ve started quite strongly in 2018. However, in mid-May I’ve injured my knee during a run in the forest. I had to take a long break. Also, I wasn’t able to run as much till the end of the year, even after starting again around August.
  • This is why I’ve performed better in 2020 overall. It simply was a more “steady” year.
  • In 2019, I’ve been really lazy - overall and especially in the looong “break” from mid-September till the end of the year.

Have fun, keep on running and don’t be like my second half of 2019 🙂

To leave a comment for the author, please follow the link and comment on their blog: Rcrastinate. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)