One of the game-changing features of
ggplot2 was the ease with which one can explore the dimensions of the data using small multiples.1 There is a small trick that I was to share today – put all the data in background of every panel. This can considerably improve comparability of the data across the dimension which splits the dataset into the subsets for the small multiples. Better to show right away what I mean and then explain in details.
1 At some point
lattice became pretty popular for the task, but then
ggplot2 entered the scene.
There is a weekly dataviz challenge organized by Cole Knaflic. One particular challenge stroke me as an ultimate case to showcase this background data trick. Here are the two plots:2 the challenge one and my version.
2 Images are clickable
It is impossible to meaningfully compare and distinguish multiple spaghetti lines at once. Thus, my choice here was to use small multiples and look at the lines one by one. But we still want to compare the lines. For this, I added the pale background lines that show the spread of all data. Note that I also sorted the small multiples in the decreasing order and added the average line in yellow.
But the main trick here is adding all the data in the background. And with
ggplot2 it’s super easy to do. All we need is to add a layer to the plot in which we modify the data by removing the variable that was used for faceting.
Here we use the nice feature of
ggplot2 – the layers inherit whatever you specify in the main
ggplot() call. In this case our background layer is inheriting the
data parameter and all we need is just to remove the variable that is later used for faceting. Consider the following pseudo-code:
df %>% ggplot(PLOT_PARAMETERS)+ geom_WHATEVER(data = df %>% setect(-FACETING_VARIABLE))+ facet_wrap(~ FACETING_VARIABLE)
Once we’ve done this,
gpplot2 no longer knows how to assign subsets of data to the corresponding small multiples. Note that this only happens in the layer where we perform the trick and explicitly throw out the faceting variable. As the result, in each small multiple we end up with all the data in this layer. Put this layer in the background, make it appropriately pale/transparent – and that’s it. I find this dataviz trick amazingly straightforward, simple, and powerful.