We were looking for a different type of visualization for a project at work this past week and my thoughts immediately gravitated towards streamgraphs. The TLDR on streamgraphs is they they are generalized versions of stacked area graphs with free baselines across the x axis. They are somewhat controversial but have a “draw you in” aesthetic appeal (which is what we needed for our visualization).
You can make streamgraphs/stacked area charts pretty easily in D3, and since we needed to try many different sets of data in the streamgraph style, it made sense to make this an R htmlwidget. Thus, the streamgraph package was born.
Making a streamgraph
The package isn’t in CRAN yet, so you have to do the
Streamgraphs require a continuous variable for the x axis, and the
streamgraph widget/package works with years or dates (support for
xts objects and
POSIXct types coming soon). Since they display categorical values in the area regions, the data in R needs to be in long format which is easy to do with
The package recognizes when years are being used and does all the necessary conversions for you. It also uses a technique similar to
expand.grid to ensure all categories are represented at every observation (not doing so makes
Let’s start by making a
streamgraph of the number of movies made per year by genre using the
library(streamgraph) library(dplyr) ggplot2::movies %>% select(year, Action, Animation, Comedy, Drama, Documentary, Romance, Short) %>% tidyr::gather(genre, value, -year) %>% group_by(year, genre) %>% tally(wt=value) %>% streamgraph("genre", "n", "year") %>% sg_axis_x(20) %>% sg_colors("PuOr") %>% sg_legend(show=TRUE, label="Genres: ")
We can also mimic an example from the Name Voyager project (using the
babynames R package) but change some of the aesthetics, just to give an idea of how some of the options work:
library(dplyr) library(babynames) library(streamgraph) babynames %>% filter(grepl("^(Alex|Bob|Jay|David|Mike|Jason|Stephen|Kymberlee|Lane|Sophie|John|Andrew|Thibault|Russell)$", name)) %>% group_by(year, name) %>% tally(wt=n) %>% streamgraph("name", "n", "year", offset="zero", interpolate="linear") %>% sg_legend(show=TRUE, label="DDSec names: ")
dat <- read.csv("http://asbcllc.com/blog/2015/february/cre_stream_graph_test/data/cre_transaction-data.csv") dat %>% streamgraph("asset_class", "volume_billions", "year") %>% sg_axis_x(1, "year", "%Y") %>% sg_colors("PuOr") %>% sg_legend(show=TRUE, label="DDSec names: ")
While the radical volume change would have been noticeable in almost any graph style, it’s especially noticeable with the streamgraph version as your eyes tend to naturally follow the curves of the flow.
While I wouldn’t have these replace my trusty ggplot2 faceted bar charts for regular EDA and reporting, streamgraphs can add a bit of color and flair, and may be an especially good choice when you need to view many categorical variables over time.
As usual, issues/feature requests on github and showcase/general feedback in the comments.