Get Miles: using treemap to visualise running distances
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By 30th September 2022, I had clocked up a total of over 2000 km of running in 2022. This milestone was a good opportunity to look at how I got to this point.
The code is shown below. First, we can make a histogram to look at the distance of runs.
From this type of plot it’s clear that my runs this year consist of a lot of 4-5 km runs and then a chunk of 21 km plus. This is because my run commute is ~5 km (5.5 km but with a summer-only shorter route of 4.4 km) and I do this a lot plus I do a weekly long run of at least 21.1 km.
A histogram like this obscures how much distance these runs contribute to the total, since one 10 km run is worth two 5 km runs. We need a better way to visualise this info.
Enter treemap
, a way to see this information more clearly.
Treemap
This visualisation shows the total distance in each category as an area. The runs are organised into bins of 1 km distance and then grouped by 5 km distance intervals.
Although the runs of 20-25 km in distance were far fewer in number, they make up more distance than the 5-10 km bracket. This was not so easy to see in the histogram.
The code
library(treemap) library(ggplot2) library(dplyr) # load the data (output from process_data() within a timeframe of interest) all_data <- read.csv("Output/Data/alldata_2022-01-01_2022-12-31.txt",sep = "\t") # make histogram of running distances p <- ggplot(all_data, aes(x = Distance)) + geom_histogram(breaks = seq(from = 0, to = 45, by = 1)) + labs(x = "Distance (km)", y = "Runs") ggsave("Output/Plots/distanceHist.png", p) # bin the data at 5 km and 1 km resolution all_data <- all_data %>% mutate(km5 = cut(Distance, breaks = seq(from = 0, to = 45, by = 5)), km1 = cut(Distance, breaks = seq(from = 0, to = 45, by = 1))) # two functions to rename the categories rename_km5 <- function(x) { x <- sub("\\(", "", x) x <- sub("\\]", " km", x) x <- sub(",", " - ", x) return(x) } rename_km1 <- function(x) { x <- sub("\\(", "", x) x <- sub(",[[:digit:]]+\\]", "", x) return(x) } # rename the categories to give nice labels all_data$labelkm5 <- rename_km5(all_data$km5) all_data$labelkm1 <- rename_km1(all_data$km1) # PNG device png("Output/Plots/tremap.png", width = 800, height = 800) treemap(all_data, index = c("labelkm5","labelkm1"), vSize = "Distance", type = "index", align.labels=list( c("left", "top"), c("center", "center") ), palette = "Set2", overlap.labels = 1, title="") dev.off()
A few comments on the code for anyone interested in replicating the plot. The data loaded in are runs within a time-frame of interest. I generated the file to load using some code I wrote previously. All that is needed is a dataframe of runs with a column called Distance.
Binning the data can be done with mutate
and cut
this factorises the distances into defined bin widths. Unfortunately, the names of the bins don’t look great on the plot, so I made two functions to reformat them to something nice. In this was (0,5]
turns into 0 - 5 km
for example.
There’s several ways to customise the Treemap and I didn’t go crazy optimising it. The palette (Set2) looked good to me and specifying type as index worked well for my needs.
—
The post title comes from “Get Miles” by Gomez from their debut LP “Bring It On”.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.