Garmonbozia: Using R to look at Garmin CSV data

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Garmin Connect has a number of plots built in, but to take a deeper dive into all your fitness data, you need to export a CSV and fire up R. This post is a quick guide to some possibilities for running data. 

There’s a few things that I wanted to look at. For example, how does my speed change through the year? How does that compare to previous years? If I see some trends, is that the same for short runs and long runs? I wanted to look at the cumulative distance I’d run each year… There’s a lot of things that would be good to analyse.

Garmin Connect has a simple way to export data as a CSV. There are other ways to get your data, but the web interface is pretty straightforward. To export a CSV of your data, head to the Garmin Connect website, login and select Activities, All Activities. On this page, filter the activities for whatever you want to export. I clicked Running (you can filter some more if you want), and then scrolled down letting the data load onto the page until I went back as far as I wanted. In the top right corner, you click Export CSV and you will download whatever is displayed on the page.

The code to generate these plots, together with some fake data to play with can be found here.

Now in R, load in the CSV file

require(ggplot2)
require(dplyr)
require(hms)
file_name <- file.choose()
df1 <- read.csv(file_name, header = TRUE, stringsAsFactors = FALSE)

We have a data frame, but we need to rejig the Dates and a few other columns before we can start making plots.

# format Date column to POSIXct
df1$Date <- as.POSIXct(strptime(df1$Date, format = "%Y-%m-%d %H:%M:%S"))
# format Avg.Pace to POSIXct
df1$Avg.Pace <- as.POSIXct(strptime(df1$Avg.Pace, format = "%M:%S"))
# make groups of different distances using ifelse
df1$Type <- ifelse(df1$Distance < 5, "< 5 km", ifelse(df1$Distance < 8, "5-8 km", ifelse(df1$Distance < 15, "8-15 km", ">15 km")))
# make factors for these so that they're in the right order when we make the plot
df1$Type_f = factor(df1$Type, levels=c("< 5 km","5-8 km","8-15 km", ">15 km"))

Now we can make the first plot. The code for the first one is below, with all the code for the other plots shown below that.

# plot out average pace over time
p1 <- ggplot( data = df1, aes(x = Date,y = Avg.Pace, color = Distance)) + 
  geom_point() +
  scale_y_datetime(date_labels = "%M:%S") +
  geom_smooth(color = "orange") +
  labs(x = "Date", y = "Average Pace (min/km)")

The remainder of the code for the other plots is shown below. The code is commented. For some of the plots, a bit of extra work on the data frame is required.

# plot out same data grouped by distance
p2 <- ggplot( data = df1, aes(x = Date,y = Avg.Pace, group = Type_f, color = Type_f)) + 
  geom_point() +
  scale_y_datetime(date_labels = "%M:%S") +
  geom_smooth() +
  labs(x = "Date", y = "Average Pace (min/km)", colour = NULL) +
  facet_grid(~Type_f)
# now look at stride length. first remove zeros
df1[df1 == 0] <- NA
# now find earliest valid date
date_v <- df1$Date
# change dates to NA where there is no avg stride data
date_v <- as.Date.POSIXct(ifelse(df1$Avg.Stride.Length > 0, df1$Date, NA))
# find min and max for x-axis
earliest_date <- min(date_v, na.rm = TRUE)
latest_date <- max(date_v, na.rm = TRUE)
# make the plot
p3 <- ggplot(data = df1, aes(x = Date,y = Avg.Stride.Length, group = Type_f, color = Type_f)) +
  geom_point() + 
  ylim(0, NA) + xlim(as.POSIXct(earliest_date), as.POSIXct(latest_date)) +
  geom_smooth() +
  labs(x = "Date", y = "Average stride length (m)", colour = NULL) +
  facet_grid(~Type_f)
df1$Avg.HR <- as.numeric(as.character(df1$Avg.HR))
p4 <- ggplot(data = df1, aes(x = Date,y = Avg.HR, group = Type_f, color = Type_f)) +
  geom_point() +
  ylim(0, NA) + xlim(as.POSIXct(earliest_date), as.POSIXct(latest_date)) +
  geom_smooth() +
  labs(x = "Date", y = "Average heart rate (bpm)", colour = NULL) +
  facet_grid(~Type_f)
# plot out average pace per distance coloured by year
p5 <- ggplot( data = df1, aes(x = Distance,y = Avg.Pace, color = Date)) + 
  geom_point() +
  scale_y_datetime(date_labels = "%M:%S") +
  geom_smooth(color = "orange") +
  labs(x = "Distance (km)", y = "Average Pace (min/km)")
# make a date factor for year to group the plots
df1$Year <- format(as.Date(df1$Date, format="%d/%m/%Y"),"%Y")
p6 <- ggplot( data = df1, aes(x = Distance,y = Avg.Pace, group = Year, color = Year)) + 
  geom_point() +
  scale_y_datetime(date_labels = "%M:%S") +
  geom_smooth() +
  labs(x = "Distance", y = "Average Pace (min/km)") +
  facet_grid(~Year)
# Cumulative sum over years
df1 <- df1[order(as.Date(df1$Date)),]
df1 <- df1 %>% group_by(Year) %>% mutate(cumsum = cumsum(Distance))
p7 <- ggplot( data = df1, aes(x = Date,y = cumsum, group = Year, color = Year)) + 
  geom_line() +
  labs(x = "Date", y = "Cumulative distance (km)")
# Plot these cumulative sums overlaid
# Find New Year's Day for each and then work out how many days have elapsed since
df1$nyd <- paste(df1$Year,"-01-01",sep = "")
df1$Days <- as.Date(df1$Date, format="%Y-%m-%d") - as.Date(as.character(df1$nyd), format="%Y-%m-%d")
# Make the plot
p8 <- ggplot( data = df1, aes(x = Days,y = cumsum, group = Year, color = Year)) + 
  geom_line() +
  scale_x_continuous() +
  labs(x = "Days", y = "Cumulative distance (km)")

Finally, we can save all of the plots using ggsave.

# save all plots
ggsave("allPace.png", plot = p1, width = 8, height = 4, dpi = "print")
ggsave("paceByDist.png", plot = p2, width = 8, height = 4, dpi = "print")
ggsave("strideByDist.png", plot = p3, width = 8, height = 4, dpi = "print")
ggsave("HRByDist.png", plot = p4, width = 8, height = 4, dpi = "print")
ggsave("allPaceByDist.png", plot = p5, width = 8, height = 4, dpi = "print")
ggsave("paceByDistByYear.png", plot = p6, width = 8, height = 4, dpi = "print")
ggsave("cumulativeDistByYear.png", plot = p7, width = 8, height = 4, dpi = "print")
ggsave("cumulativeDistOverlay.png", plot = p8, width = 8, height = 4, dpi = "print")

I think the code might fail if you don’t record all of the fields that I do. For example if heart rate data is missing or stride length is not recorded, I’m not sure what the code will do. The aim here is to give an idea of what sorts of plots can be generated just using the summary data in the CSV provided by Garmin. Feel free to make suggestions in the comments below.

The post title comes from “Garmonbozia” by Superdrag from the Regretfully Yours album. Apparently Garmonbozia is something eaten by the demons in the Black Lodge in the TV series Twin Peaks.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)