# Get Miles: using treemap to visualise running distances

**Rstats – quantixed**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

By 30th September 2022, I had clocked up a total of over 2000 km of running in 2022. This milestone was a good opportunity to look at how I got to this point.

The code is shown below. First, we can make a histogram to look at the distance of runs.

From this type of plot it’s clear that my runs this year consist of a lot of 4-5 km runs and then a chunk of 21 km plus. This is because my run commute is ~5 km (5.5 km but with a summer-only shorter route of 4.4 km) and I do this a lot plus I do a weekly long run of at least 21.1 km.

A histogram like this obscures how much distance these runs contribute to the total, since one 10 km run is worth two 5 km runs. We need a better way to visualise this info.

Enter `treemap`

, a way to see this information more clearly.

## Treemap

This visualisation shows the total distance in each category as an area. The runs are organised into bins of 1 km distance and then grouped by 5 km distance intervals.

Although the runs of 20-25 km in distance were far fewer in number, they make up more distance than the 5-10 km bracket. This was not so easy to see in the histogram.

## The code

library(treemap) library(ggplot2) library(dplyr) # load the data (output from process_data() within a timeframe of interest) all_data <- read.csv("Output/Data/alldata_2022-01-01_2022-12-31.txt",sep = "\t") # make histogram of running distances p <- ggplot(all_data, aes(x = Distance)) + geom_histogram(breaks = seq(from = 0, to = 45, by = 1)) + labs(x = "Distance (km)", y = "Runs") ggsave("Output/Plots/distanceHist.png", p) # bin the data at 5 km and 1 km resolution all_data <- all_data %>% mutate(km5 = cut(Distance, breaks = seq(from = 0, to = 45, by = 5)), km1 = cut(Distance, breaks = seq(from = 0, to = 45, by = 1))) # two functions to rename the categories rename_km5 <- function(x) { x <- sub("\\(", "", x) x <- sub("\\]", " km", x) x <- sub(",", " - ", x) return(x) } rename_km1 <- function(x) { x <- sub("\\(", "", x) x <- sub(",[[:digit:]]+\\]", "", x) return(x) } # rename the categories to give nice labels all_data$labelkm5 <- rename_km5(all_data$km5) all_data$labelkm1 <- rename_km1(all_data$km1) # PNG device png("Output/Plots/tremap.png", width = 800, height = 800) treemap(all_data, index = c("labelkm5","labelkm1"), vSize = "Distance", type = "index", align.labels=list( c("left", "top"), c("center", "center") ), palette = "Set2", overlap.labels = 1, title="") dev.off()

A few comments on the code for anyone interested in replicating the plot. The data loaded in are runs within a time-frame of interest. I generated the file to load using some code I wrote previously. All that is needed is a dataframe of runs with a column called Distance.

Binning the data can be done with `mutate`

and `cut`

this factorises the distances into defined bin widths. Unfortunately, the names of the bins don’t look great on the plot, so I made two functions to reformat them to something nice. In this was `(0,5]`

turns into `0 - 5 km`

for example.

There’s several ways to customise the Treemap and I didn’t go crazy optimising it. The palette (Set2) looked good to me and specifying type as index worked well for my needs.

—

The post title comes from “Get Miles” by Gomez from their debut LP “Bring It On”.

**leave a comment**for the author, please follow the link and comment on their blog:

**Rstats – quantixed**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.