TreeMap World Population visualisation

[This article was first published on ipub » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This example is inspired by the examples of the treemap package. You’ll learn how to
    • convert a data.frame to a data.tree structure
    • navigate a tree and locate specific nodes
    • use Aggregate and Cumulate
    • manipulate an existing tree, e.g. by using the Prune method
    • use data.tree in connection with the treemap package
This code builds on version 0.2.4 of the data.tree package, which you can get from CRAN or from github. For more posts on data.tree, see here. You will also find this example in the package’s applications vignette.

Original treemap Example (to be improved)

The original example, as available in the treemap package documentation, visualises the world population as a tree map.   There are many countries, so the chart gets clustered with many very small boxes. In this example, we will limit the number of countries shown, and sum the remaining population in a catch-all country called “Other”. We use the data.tree package to do this aggregation.

Conversion from data.frame

First, let’s convert the population data into a data.tree structure: We can easily navigate the tree to find the population of a specific country. Luckily, RStudio is quite helpful with its code completion (use CTRL + SPACE): Or, we can look at a sub-tree:    

Aggregate and Cumulate

We now want to aggregate the population. For non-leaves, this will recursively iterate through children, and cache the result in the population field. The main reason why we do this is not to calculate the population of the world, but to store the result via thecacheAttribute.   Next, we sort each node by population:   Finally, we cumulate among siblings, and store the running sum in an attribute calledcumPop:   The tree now looks as follows. Note the new attributes cumPop, as well as the sort order:  

Prune

The previous steps were done to define our threshold: big countries should be displayed, while small ones should be grouped together. This lets us define a pruning function that will allow a maximum of 7 countries per continent. Additionally, it will prune all countries making up less than 90% of a continent’s population:  
We clone the tree. The reason is that data.tree uses reference semantics, and we want to store the original tree, because we might want to play around later with different parameters:
Finally, we need to sum countries that we pruned away into a new “Other” node:

Plotting the treemap

In order to plot the treemap, we need to convert the data.tree structure back to a data.frame:  
And here we go: Our treemap now has at most 7 countries per continent, and groups all countries below the 90th percentile: If you have enjoyed this example, I recommend you read the package’s vignettes, or have a look at the other data.tree posts in this blog.

To leave a comment for the author, please follow the link and comment on their blog: ipub » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)