Treemaps In R

[This article was first published on coding-the-past, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


1. What is a treemap?

A treemap consists of a set of rectangles which represent different categories in your data and whose size is defined by a numeric value associated with the respective category. For example, a treemap could illustrate the continents on Earth, sized according to their population. For a deeper analysis, treemaps can include nested rectangles, that is, categories within categories. In our example, within each continent rectangle, new rectangles could represent countries and their populations.

Visual representation of a treemap.



2. When should you use a treemap?

One of the main advantages of a treemap is that it allows for the interpretation of a large amount of data at a single glance. It is well-suited to show part-to-whole relationships and to highlight the hierarchies in your data. Do not use treemaps when the variable defining the size of rectangles presents little variation.




3. How to plot a treemap in R?

To exemplify a treemap in R, we will use the Cholera dataset, which contains that on the mortality caused by cholera in England in the years 1848-1849. This data comes from the histdata R package. Moreover, you will need to install the treemap package, one of the alternatives to plot a treemap in R. We will also use RColorBrewer package for a color palette and dplyr to transform the data.


After you install the packages, load them and explore the structure of the Cholera data frame.


content_copy Copy

library(HistData)
library(treemap)
library(dplyr)
library(RColorBrewer)

# Load the data to your R environment
data("Cholera")

# Check the dataframe structure
str(Cholera)


We would like to create a treemap in which we have bigger rectangles representing the regions of London and smaller rectangles representing the districts within their respective region. The size of the rectangles will inform us about the mortality caused by cholera in a given region and district. For us, the following variables are important:

  • region will define our outer rectangles (higher hierarchy) and will represent regions of London (West, North, Central, South, Kent);
  • district will define our inner rectangles (lower hierarchy), representing the districts of London;
  • cholera_drate represents deaths caused by cholera per 10,000 inhabitants in 1849 and will define the size of rectangles


tips_and_updates  
To learn more about the data, please see the package documentation here.


The treemap function is used to plot the treemap in R. The main arguments necessary are:

  • the first argument is the dataframe;
  • index defines the two levels of hierarchy in our plot: region and district;
  • vSize specifies the death rate to define the size of our rectangles;
  • vColor specifies the region to define the color of our higher hierarchy rectangles;
  • type informs the function that vColor is a categorical variable;
  • the remaining parameters are used to adjust format options like color palette and position of elements.


tips_and_updates  
To further format your treemap, check more options in the package documentation.


content_copy Copy

treemap(Cholera,
        index=c("region","district"),
        vSize="cholera_deaths",
        vColor = "region",
        type = "categorical",
        # formatting options:
        palette = brewer.pal(n = 5, name = "Accent"),
        align.labels=list(
          c("left", "top"), 
          c("right", "bottom")
        ),     
        border.col = "white",
        bg.labels = 255,
        position.legend = "none")


Treemap plotted in R


Note that Kent is the region with the largest death rate, followed by Southern London. Moreover, districts like Lambeth and Bethnal Green were especially affected by the disease. This treemap is a powerful tool to give you a general picture of the data at first glance.


If you have any questions, please feel free to comment below!




4. Conclusions


  • A treemap is very useful to represent hierarchical relations in your data and provide a quick overall picture of your data;
  • Plotting a treemap in R can be easily accomplished with the treemap package.



To leave a comment for the author, please follow the link and comment on their blog: coding-the-past.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)