I keep seeing years represented by coloured bars. First it was that demographic tsunami chart. Then there are examples like the one on the right, which came up in a web search today. I even saw one (whispers) at work today.
I get what they are trying to do – illustrate trends within categories over time – but I don’t think years as coloured bars is the way to go. To me, progression over time suggests that time should be an axis, so as the eye moves along the data from one end to the other, without interruption. What I want to see is categories over time, not time within categories.
So what is the way to go? Let’s ask “what would ggplot2 do?”
The following charts illustrate different ways to visualise the same data using
ggplot2. My motivation here is to show you that if you don’t immediately know or see a “right way” to do something, tools such as
ggplot2 make it easy to “feel your way” to a solution, through exploration.
The charts and their accompanying code are available at RPubs. Click each image at right for a full-size version.
It looks better already just for being generated using
ggplot2. But can we go better?
Your first thought might be “why not just swap the years and categories around?” And sure, that gives us time along an axis. Now though, it’s a little difficult to follow each category, as the eye has to skip all the others when moving to the next time point.
OK you say, I can get all the categories at the same time point by stacking. A couple of problems now: first, abrupt changes in value can make a category shrink dramatically or move around vertically in a distracting fashion. And second, making the categories proportional can make it difficult to determine the absolute values for anything other than the lowest row of the bars.
I think this works much better: the continuous connection of categories makes it easier to follow each one through time. However, there is still the issue of relative versus absolute values. And to my eye, downward lines can be interpreted as decreases even when the width of the area for a category indicates that the value has increased.
Now we’re getting somewhere – much easier to follow each category over time. One issue with this particular arrangement is that it’s a little difficult to compare categories and the eye finds it difficult to isolate a facet from surrounding facets.
Not bad at all – this makes categories within a year very clear. But wait, we wanted a good view of each category over time. So how about…
I think we have a winner. This clearly illustrates change per category over time and the layout and common scales even allow for comparison between categories.
We started out complaining about time within categories but in fact, that is what we wanted after all: just arranged in a better way than years as coloured bars .
Filed under: R, statistics Tagged: ggplot2, visualisation