Stacking the deck against treemaps

March 18, 2016
By

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

So, I (unapologetically) did this to @Highcharts last week:

They did an awesome makeover (it’s interactive if you follow the link):

chart

And, I’m not kidding, it’s actually a really good treemap. Not too many hierarchies or discrete categories. But, it’s still hard for humans to compare things without the aid of the interaction (which is totally fair, the Highcharts folks do interaction well). I always try to find an alternative to treemaps, usually through trying to figure out the story to tell. I think there’s at least one story in the Highcharts data that we can uncover with a different visualization. Ironically, the visualization I’ve chosen is a stacked bar chart (I don’t generally like them, either). I’ll frame the story and then dissect the code.

RStudioScreenSnapz021
We looked at the number of frameworks being used with Highcharts across web-oriented programming languages. Surprisingly, four of the six top languages—Java, PHP, Python & dotNet—show Highcharts being used *without* an associated framework, which highlights the flexible nature of Highcharts. There seems to be—unsurprisingly—only one player in town when it comes to Ruby: Ruby on Rails, and the high prevalence of AngularJS tracks with Angular’s apparent dominance in javascript land. INSERT_MARKETING_LANGAUGE_HERE

In real life, I’d add a DataTables interactive table with this to let folks explore a bit more.

Making this in R & ggplot2

Highcharts used a Google Sheet to hold the data for their treemap makeover. That means we can have some fun with it in R. So, the two main story points are:

  1. show how the languages, and in-language frameworks rank against each other
  2. show the dominant framework in each language

As demonstrated, I’ve chosen to use stacked bar charts since there only six languages and it turns out there is a dominant category for each.

A design criteria I made was to use the main or alternate color for each language and use a gradient to segment each in-language framework. I chose the yellow alternate color for Python since it’s such cowardly language there was enough blue in the chart already. Java & Ruby are separated enough that their slightly different reds aren’t too bad/confusing (and neither language left me with much of an alternative). I picked a green from the Mozilla palette for JavaScript since they seem to dominate any Google search for JavaScript info.

Let’s get libraries out of the way. I’m using my personal theme since I really don’t feel like typing everything out. If you need me to, drop a note and I’ll see what I can do.

library(googlesheets) # get the data
library(dplyr)        # reshape the data
library(ggplot2)      # plot
library(hrbrmisc)     # theme
library(scales)       # plot helpers

First, we need the data, and that’s where @jennybryan’s excellent googlesheets package comes into play:

sheet <- gs_key("1wYm5waQmiYKGhtdofvXDS8SHdh72Mwcnygvf3bvFfoU")
 
langs <- gs_read(sheet)
langs <- langs[-(1:6), 2:4]

We need to be able to order the programming languages by # of frameworks and we need the colors defined:

tops <- count(langs, parent, wt=value)
 
parent_cols <- c(Java="#960000", PHP="#8892bf", Python="#ffdc51", 
                 JavaScript="#70ab2d", dotNet="#68217a", Ruby="#af1401")

To get bars and stacked segments sorted the right way, we need to add a helper column and arrange the overall data frame:

langs <- arrange(ungroup(mutate(group_by(langs, parent), rank=rank(value))), -rank)

Next, we need to assign colors per language and in-language framework, I do this by computing an ordered alpha value for each framework dependent on the number of frameworks in the language:

langs <- mutate(group_by(langs, parent), 
                color=alpha(parent_cols[parent[1]], seq(1, 0.3, length.out=n())))>

Finally we need the actual languages in factor order for ggplot:

langs$parent <- factor(langs$parent, levels=arrange(tops, n)$parent)

We also need the dominant frameworks separated out so we can annotate them. Extra marks for ensuring they’re readable (black vs white depending on the base color):

top_f <- slice(group_by(langs, parent), 1)
top_f$color <- c("white", "white", "#2b2b2b", "#2b2b2b", "white", "white")

With the data in the right format, the actual ggplot code isn’t too cumbersome:

gg <- ggplot()
 
# stack the bars. the bars themselvs will be ordered by the language factor and our
# computed rank will stack them in the right order. we'll use an identify fill for
# the mapped fill aesthetic
 
gg <- gg + geom_bar(data=langs, stat="identity", 
                    aes(x=parent, y=value, fill=color, order=rank),
                    color="white", size=0.15, width=0.65)
 
# text labels at the end of the bar means no need for any extra chart junk
 
gg <- gg + geom_text(data=tops, family="NoyhSlim-Medium",
                     aes(x=parent, y=n, label=n), 
                     hjust=-0.2, size=3)
 
# here's how we label the dominant framework
 
gg <- gg + geom_text(data=top_f, family="NoyhSlim-Medium",
                     aes(x=parent, y=value/2, label=id, color=color), 
                     hjust=0.5, size=3)
 
# we'll control our own panel breathing room, thanks anyway, ggplot2
 
gg <- gg + scale_x_discrete(expand=c(0,0))
gg <- gg + scale_y_continuous(expand=c(0,0), limits=c(0, 900))
 
# these tell ggplot to use the color we've specified vs map it to a scale
 
gg <- gg + scale_color_identity()
gg <- gg + scale_fill_identity()
 
# the rest doesn't need 'splainin
 
gg <- gg + coord_flip()
gg <- gg + labs(x=NULL, y=NULL,
                title="Popular web frameworks using Highcharts",
                subtitle="Total usage by language, including the most popular framework in-language",
                caption="Data graciously provided by Highcharts - http://jsfiddle.net/vidarbrekke/n6pd4jfo/")
gg <- gg + theme_hrbrmstr(grid=FALSE, axis="y")
gg <- gg + theme(legend.position="none")
gg <- gg + theme(axis.text.x=element_blank())
gg

If I wanted to kill more time, I’d’ve used the language logo vs the name in the axis.

Fin

What story/stories can you glean from the data and how would you tell them? Drop a note in the comments with your creation(s)!

Complete, contiguous code is in this gist.

Note that stacked bars aren’t always a replacement for treemaps and that treemaps do have valid uses. The important part is to choose the visualization that best supports the story you want to tell.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)