So, I (unapologetically) did this to @Highcharts last week:
They did an awesome makeover (it’s interactive if you follow the link):
And, I’m not kidding, it’s actually a really good treemap. Not too many hierarchies or discrete categories. But, it’s still hard for humans to compare things without the aid of the interaction (which is totally fair, the Highcharts folks do interaction well). I always try to find an alternative to treemaps, usually through trying to figure out the story to tell. I think there’s at least one story in the Highcharts data that we can uncover with a different visualization. Ironically, the visualization I’ve chosen is a stacked bar chart (I don’t generally like them, either). I’ll frame the story and then dissect the code.
In real life, I’d add a DataTables interactive table with this to let folks explore a bit more.
Making this in R & ggplot2
Highcharts used a Google Sheet to hold the data for their treemap makeover. That means we can have some fun with it in R. So, the two main story points are:
- show how the languages, and in-language frameworks rank against each other
- show the dominant framework in each language
As demonstrated, I’ve chosen to use stacked bar charts since there only six languages and it turns out there is a dominant category for each.
A design criteria I made was to use the main or alternate color for each language and use a gradient to segment each in-language framework. I chose the yellow alternate color for Python since
Let’s get libraries out of the way. I’m using my personal theme since I really don’t feel like typing everything out. If you need me to, drop a note and I’ll see what I can do.
library(googlesheets) # get the data library(dplyr) # reshape the data library(ggplot2) # plot library(hrbrmisc) # theme library(scales) # plot helpers
First, we need the data, and that’s where @jennybryan’s excellent
googlesheets package comes into play:
sheet <- gs_key("1wYm5waQmiYKGhtdofvXDS8SHdh72Mwcnygvf3bvFfoU") langs <- gs_read(sheet) langs <- langs[-(1:6), 2:4]
We need to be able to order the programming languages by # of frameworks and we need the colors defined:
To get bars and stacked segments sorted the right way, we need to add a helper column and arrange the overall data frame:
langs <- arrange(ungroup(mutate(group_by(langs, parent), rank=rank(value))), -rank)
Next, we need to assign colors per language and in-language framework, I do this by computing an ordered alpha value for each framework dependent on the number of frameworks in the language:
langs <- mutate(group_by(langs, parent), color=alpha(parent_cols[parent], seq(1, 0.3, length.out=n())))>
Finally we need the actual languages in factor order for
langs$parent <- factor(langs$parent, levels=arrange(tops, n)$parent)
We also need the dominant frameworks separated out so we can annotate them. Extra marks for ensuring they’re readable (black vs white depending on the base color):
top_f <- slice(group_by(langs, parent), 1) top_f$color <- c("white", "white", "#2b2b2b", "#2b2b2b", "white", "white")
With the data in the right format, the actual
ggplot code isn’t too cumbersome:
gg <- ggplot() # stack the bars. the bars themselvs will be ordered by the language factor and our # computed rank will stack them in the right order. we'll use an identify fill for # the mapped fill aesthetic gg <- gg + geom_bar(data=langs, stat="identity", aes(x=parent, y=value, fill=color, order=rank), color="white", size=0.15, width=0.65) # text labels at the end of the bar means no need for any extra chart junk gg <- gg + geom_text(data=tops, family="NoyhSlim-Medium", aes(x=parent, y=n, label=n), hjust=-0.2, size=3) # here's how we label the dominant framework gg <- gg + geom_text(data=top_f, family="NoyhSlim-Medium", aes(x=parent, y=value/2, label=id, color=color), hjust=0.5, size=3) # we'll control our own panel breathing room, thanks anyway, ggplot2 gg <- gg + scale_x_discrete(expand=c(0,0)) gg <- gg + scale_y_continuous(expand=c(0,0), limits=c(0, 900)) # these tell ggplot to use the color we've specified vs map it to a scale gg <- gg + scale_color_identity() gg <- gg + scale_fill_identity() # the rest doesn't need 'splainin gg <- gg + coord_flip() gg <- gg + labs(x=NULL, y=NULL, title="Popular web frameworks using Highcharts", subtitle="Total usage by language, including the most popular framework in-language", caption="Data graciously provided by Highcharts - http://jsfiddle.net/vidarbrekke/n6pd4jfo/") gg <- gg + theme_hrbrmstr(grid=FALSE, axis="y") gg <- gg + theme(legend.position="none") gg <- gg + theme(axis.text.x=element_blank()) gg
If I wanted to kill more time, I’d’ve used the language logo vs the name in the axis.
What story/stories can you glean from the data and how would you tell them? Drop a note in the comments with your creation(s)!
Complete, contiguous code is in this gist.
Note that stacked bars aren’t always a replacement for treemaps and that treemaps do have valid uses. The important part is to choose the visualization that best supports the story you want to tell.