Get Miles: using treemap to visualise running distances

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

By 30th September 2022, I had clocked up a total of over 2000 km of running in 2022. This milestone was a good opportunity to look at how I got to this point.

The code is shown below. First, we can make a histogram to look at the distance of runs.

Standard ggplot histogram to look at the frequency of run distances

From this type of plot it’s clear that my runs this year consist of a lot of 4-5 km runs and then a chunk of 21 km plus. This is because my run commute is ~5 km (5.5 km but with a summer-only shorter route of 4.4 km) and I do this a lot plus I do a weekly long run of at least 21.1 km.

A histogram like this obscures how much distance these runs contribute to the total, since one 10 km run is worth two 5 km runs. We need a better way to visualise this info.

Enter treemap, a way to see this information more clearly.

Treemap

Treemap of 2022 runs so far

This visualisation shows the total distance in each category as an area. The runs are organised into bins of 1 km distance and then grouped by 5 km distance intervals.

Although the runs of 20-25 km in distance were far fewer in number, they make up more distance than the 5-10 km bracket. This was not so easy to see in the histogram.

The code

library(treemap)
library(ggplot2)
library(dplyr)
# load the data (output from process_data() within a timeframe of interest)
all_data <- read.csv("Output/Data/alldata_2022-01-01_2022-12-31.txt",sep = "\t")
# make histogram of running distances
p <- ggplot(all_data, aes(x = Distance)) +
  geom_histogram(breaks = seq(from = 0, to = 45, by = 1)) +
  labs(x = "Distance (km)", y = "Runs")
ggsave("Output/Plots/distanceHist.png", p)

# bin the data at 5 km and 1 km resolution
all_data <- all_data %>%
  mutate(km5 = cut(Distance, breaks = seq(from = 0, to = 45, by = 5)),
         km1 = cut(Distance, breaks = seq(from = 0, to = 45, by = 1)))

# two functions to rename the categories
rename_km5 <- function(x) {
  x <- sub("\\(", "", x)
  x <- sub("\\]", " km", x)
  x <- sub(",", " - ", x)
  return(x)
}
rename_km1 <- function(x) {
  x <- sub("\\(", "", x)
  x <- sub(",[[:digit:]]+\\]", "", x)
  return(x)
}

# rename the categories to give nice labels
all_data$labelkm5 <- rename_km5(all_data$km5)
all_data$labelkm1 <- rename_km1(all_data$km1)

# PNG device
png("Output/Plots/tremap.png", width = 800, height = 800)
treemap(all_data,
        index = c("labelkm5","labelkm1"),
        vSize = "Distance",
        type = "index",
        align.labels=list(
          c("left", "top"), 
          c("center", "center")
        ), 
        palette = "Set2",
        overlap.labels = 1,
        title="")
dev.off()

A few comments on the code for anyone interested in replicating the plot. The data loaded in are runs within a time-frame of interest. I generated the file to load using some code I wrote previously. All that is needed is a dataframe of runs with a column called Distance.

Binning the data can be done with mutate and cut this factorises the distances into defined bin widths. Unfortunately, the names of the bins don’t look great on the plot, so I made two functions to reformat them to something nice. In this was (0,5] turns into 0 - 5 km for example.

There’s several ways to customise the Treemap and I didn’t go crazy optimising it. The palette (Set2) looked good to me and specifying type as index worked well for my needs.

The post title comes from “Get Miles” by Gomez from their debut LP “Bring It On”.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)