To be reductive, visual displays of quantitative information might be reasonably categorized on a continuum between “data display” and “statistical graphics.” By statistical graphics, I mean a plot that displays some summary of or relationship amongst several variables, likely having undergone some processing or analysis. This may be as simple as a scatterplot of a primary independent variable and the dependent variable, a boxplot, or a graphical regression table.
In this reductive scheme, then, “data displays” present variables in raw form — for use in exploratory data analysis, or perhaps just to offer the viewer access to all of the data. Where “statistical graphics” might be best served by simplicity and minimalism in design, such that a single idea might be conveyed clearly, “data displays” will tend to be inherently complex, and require effort from both the creator and viewer to parse meaning from the available information.
Where statistical graphics are ideal for presenting conclusions, data displays are useful for generating ideas, and optimally, permitting the relatively rapid identification of relationships between multiple variables. On top of this, I might add that many of the more well-regarded data displays of recent note offer macro-level insight as well as the opportunity to ascertain specific details (for this, interactivity is often valuable, as in the internet-classic New York Times box office visualization).
As several recent posts suggest, I am interested in finding ways to successfully and clearly convey multidimensional data, and have been focusing on political data as it varies across geopolitical units and time. Here I offer an approach which departs from the spatial basis of other recent efforts in favor of allowing the position of graphical objects to convey other variables.
This type of plot is called, variously, a spinogram, a mosaic plot, or a marimekko — and is not dissimilar from a treemap with a different organizational structure (other examples). The utility of this plot type is that it can spatially convey four numeric variables (x position, y position, height, width), and color can be added to incorporate up to three additional variables (R, G, B). Further, there is a straightforward geometric interpretation of each cell: the areas of each (in this case, width/state turnout ×height/county proportion of state turnout) are directly comparable.
Unlike a stacked bar plot, the width of each column conveys information, permitting height to convey proportion rather than count. Further, columns and cells within columns can be sorted to express the ordering of variables of interest. In some ways, these can be seen as extreme reinterpretations of (Dorling) cartograms, in which not only the size and shape of political boundaries, but also their position, are distorted by other variables.
In the plots above, cells are colored according to the strength of Democratic (blue), Republican (red), and other party (green) support, and counties whose turnout represents greater than 1% of the total turnout in an election are labeled.
I present two different layouts for the cells in each plot. The first arrays states left-to-right in order of the number of votes cast in an election, and sorts counties bottom-to-top in the same order. Thus, more populous states are on the right, and more populous counties are at the top of the plot. This arrangement allows the viewer to observe the effects of population density both within and across states, and may better facilitate tracking changes in county or state politics over time.
The second layout sorts states left-to-right, and counties bottom-to-top in order of the Democratic share of the two party vote (Dem Votes / (Dem Votes + Rep Votes)). Thus, more Democratic-leaning (relative to Republican) states are on the right, and counties that were more supportive of Democratic candidates are at the top. I believe that this arrangement makes it easier to discern overall trends in partisanship across time, as the total “sum” of red within a diagram is relatively easy to compare to the total “sum” of blue (and green).
I have attempted to make my R code fairly general, and it is available for download here, although it will obviously require some modifications for other applications. Our approaches differ, but another instructive example can be found at Learning R.