Running Around: 2022 running dataviz in R

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

2022 was my best year for running to date. In 2021, my goal was to run 2021 km. For 2022, I wanted to see if I could run 2500 km and also to run 50 HM-or-more distance runs. I managed both and ended the year on a total of 2734 km. I also bagged two PBs for half marathon.

Of course, if you subscribe to Strava or VeloViewer or whatever, you can get a nice data visualisation of your year in running. But where’s the fun in that when we can do that (and so much more) in R?

Reaching the goal

I used my previous scripts to track my progress against the goal. At some point in July, I upped my weekly kms and went ahead of my goal. I hit 2500 km at the end of November.

How did I do it? Well, here are all my runs:

I generated this visualisation using Marcus Volz’s Strava package. Details of how I used it are here. Briefly, I have a local store of gpx files for all my activities and these can be loaded into R and visualised with Strava.

We are looking at all the courses I ran in 2022 (in order). They are shown to scale. You can see that I did lots of short runs (run commutes) and a smaller number of longer ones.

Let’s look at that in more detail. A treemap view works well here:

From this breakdown we can see that about half the total distance came from short runs, <10 km. In fact, 4 km and 5 km runs dominate. These are my run commutes, which is a distance between 4.4 and 5.5 km, depending on the route. About a quarter of the total distance came from runs in the region of 21-25 km and the remainder from 10-24 and 25+ distances. I didn’t do any runs between 15-20 km because anything in that range would be bumped up to HM distance to meet my goal.

The 2734 km came from 332 runs, but how did I fit them in? And how did I get some rest?

A calendar view is nice here. We can look either at the number of runs per day or the kms run per day. I did “the church of the long run” i.e. a single, long run on Sunday; and as we have seen, run-commutes which are typically two runs of shorter distance. I mostly did not run on Saturdays. At the start of the year I didn’t run on Mondays and Thursdays either, but that was less of a rule after the summer.

The progress reports, treemap and calendar view were all created using data downloaded from Garmin Connect. To load the data and process it for the reports see here; for the treemap see here; for the calendar view I am using a function that I described here. The code to generate the plots above is:


# we start with all_data which is loaded from the previous code using
# process_data("running","2022-01-01","2022-12-31",2500)

# summarise the running data by day
df_day <- all_data
df_day$Date <- as.Date(all_data$Date)
df_day <- df_day %>% 
  summarize_by_time(.date_var = Date,
                    .by = "day",
                    Distance = sum(Distance),
                    n = n())
# first plot
p1 <- calendarHeatmap(df_day$Date, df_day$n, title = "Running 2022", subtitle = "Runs per day")
# second plot
p2 <- calendarHeatmap(df_day$Date, df_day$Distance, title = "", subtitle = "km per day")
# assemble with patchwork
p <- p1 / p2
ggsave("Output/Plots/calendar_per_day.png", p)

What about the 50 HM-or-greater goal?

Well, we can also array these longer runs out to look at them in all their glory.

HM-or-more routes. Can you spot the one I ran in Washington DC?

I tried to run different routes for these long runs. In 2021, I had a sub-goal of running 30 HM-or-more courses. That year, I set myself the additional criterion that each HM-or-more course must be different. I relaxed that this year and ran a couple of courses 3 or 4 times. I still managed to vary them quite a bit though.

New half-marathon PBs

I managed to improve on my HM time twice this year. My previous best was set in 2018 and since that time I ran two slower HMs, which was annoying. I changed a few things and managed to improve my time this year in March and again in September. In the summer, I ran a HM which was faster than my 2018 best but did not improve on my March 2022 time. My September 2022 PB was gratifying as I was wondering if I would ever go under 95 min for HM… especially as I am not getting any younger.

Salmon-coloured points are races, blue points are just other HM distance runs

Generating this plot was a bit tricky. If someone knows a better way, let me know!

# process_load() is a function used in TSS analysis
# load the data we want to look at
mydata <- process_load("Running","2016-01-01","2022-12-31")
# drop the 1st column and remove duplicates
mydata$Activity.Type <- NULL
mydata <- distinct(mydata)
# Time is a character vector, change to POSIXct
mydata$Time <- as.POSIXct(strptime(mydata$Time, format = "%H:%M:%S"))

# filter for HM distance runs in this period
df_hm <- mydata %>% 
  filter(Distance > 20.9 & Distance < 21.5)

# the following code generates a second data frame to visualise PBs
record <- data.frame(Date = df_hm$Date[1],
                     Time = df_hm$Time[1])
minTime <- record$Time[1]

for (i in 2:nrow(df_hm)) {
  if(df_hm$Time[i] < minTime) {
    recordA <- data.frame(Date = df_hm$Date[i],
                          Time = minTime)
    minTime <- df_hm$Time[i]
    recordB <- data.frame(Date = df_hm$Date[i],
                          Time = minTime)
    record <- rbind(record,recordA,recordB)
  if(i == nrow(df_hm)) {
    recordB <- data.frame(Date = df_hm$Date[i],
                          Time = minTime)
    record <- rbind(record,recordB)

# generate the plot
# I cut the data using 01:41:00 to differentiate between HM events and other runs, i.e. it is a hack.
p <- ggplot() +
  geom_point(data = df_hm, aes(x = Date, y = Time,
                               colour = cut(Time, as.POSIXct(strptime(c("01:30:00","01:41:00","02:00:00"), format = "%H:%M:%S"))))) +
  geom_line(data = record, aes(x = Date, y = Time), linetype = 2) +
  lims(y = as.POSIXct(strptime(c("01:30:00","02:00:00"), format = "%H:%M:%S"))) +
  theme_bw() +
  theme(legend.position = "none")
ggsave("Output/Plots/hm_pb.png", p)

Form throughout the year

It is worth tracking running form to avoid injury. I did this using TSS (described here). This is the graphic for the whole year.

I find this view of fatigue and fitness very useful and will track this again in 2023. I spend way too much time in the grey zone and I think I can improve my HM time if I get more focused with my training.


Beyond self-congratulation, the conclusion of this post is that R is very useful to analyse running data, to track progress and to visualise running achievements. Sure, this can all be done automatically by a third party app, but if you maintain your own running log offline or if you just want more control over the analysis, R is fantastic for generating similar (or better) visualisations, which can be bespoke and tailored for what you want to know.

The post title is taken from “Running Around” by D.R.I. I have several copies of this song, but since I am music snob, I will say it is a rip from Violent Pacification 7″ EP released in 1984. Amazingly I haven’t used this song title on quantixed yet.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)