**Statistical Modeling, Causal Inference, and Social Science » R**, and kindly contributed to R-bloggers)

Our discussion on data visualization continues.

One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics.

On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe, who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand.

And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics.

I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004)–but I am concerned that our dialogue with the graphics experts is not moving forward quite as I’d wished.

I’m not trying to win any arguments here; rather I’m trying to move the discussion away from “good vs. bad” (I know I’ve contributed to that attitude in the past, and I’m sure I’ll do so again) toward a discussion of different goals.

I’ll try to write something more systematic on the topic, but for now I’d like to continue by discussing examples.

My article with Antony had many many examples but we got so involved in the statistical issues of data presentation that I think the main thread of the argument got lost.

For example, Hadley Wickham, creator of the great ggplot2, wrote:

Unfortunately both sides [statisticians and infovgraphics people] seem to be comparing the best of one side with the worst of the other. There are some awful infovis papers that completely ignore utility in the pursuit of aesthetics. There are many awful stat graphics papers that ignore aesthetics in the pursuit of utility (and often fail to achieve that). Neither side is perfect, and it’s a shame that we can’t work more closely together to get the best of both worlds.

I agree about the best of both worlds (and return to this point at the end of the present post). But I don’t agree that we’re comparing to “the worst of the other.” Sure, sometimes this is true (as in the notorious “chartjunk” paper in which pretty graphs are compared to piss-poor plots that violate every principle of visualization and statistical graphics).

But recent web discussions have been about the best, not the worst. In my long article with Unwin, we discussed the “5 best data visualizations of the year”! In our short article, we discuss Florence Nightingale’s spiral graph, which is considered a data visualization classic. And, from the other side, my impression is that infographics gurus are happy to celebrate the best of statistical graphics.

But in this sort of discussion we *have* to discuss examples we don’t like. There are some infographics that I love love love–for example, Laura and Martin Wattenberg’s Name Voyager, which is on my blogroll and which I’ve often linked to. But I don’t have much to say about these–I consider them to have the best features of statistical graphics.

In much of my recent writing on graphics, I’ve focused on visualizations that have been popular and effective–Wordle is an excellent example here–while not following what I would consider to be good principles of *statistical* graphics.

When I discuss the failings of Wordle (or of Nightingale’s spiral, or Kosara’s swirl, or this graph), it is not to put them down, but rather to highlight the gap between (a) what these visualizations do (draw attention to a data pattern and engage the viewer both visually and intellectually) and (b) my goal in statistical graphics (to display data patterns, both expected and unexpected). The differences between (a) and (b) are my subject, and a great way to highlight them is to consider examples that are effective as infovis but not as statistical graphics. I would have no problem with Kosara etc. doing the opposite with my favorite statistical graphics: demonstrating that despite their savvy graphical arrangements of comparisons, my graphs don’t always communicate what I’d like them to.

I’m very open to the idea that graphics experts could help me communicate in ways that I didn’t think of, just as I’d hope that graphics experts would accept that even the coolest images and dynamic graphics could be reimagined if the goal is data exploration.

To get back to our exchange with Kosara, I stand firm in my belief that the swirly plot is not such a good way to display time series data–there are more effective ways of understanding periodicity, and no I don’t think this has anything to do with dynamic vs. static graphics or problems with R. As I noted elsewhere, I think the very feature that makes many infographics appear beautiful is that they reveal the expected in an unexpected way, whereas statistical graphics are more about revealing the unexpected (or, as I would put it, checking the fit to data of models which may be explicitly or implicitly formulated. But I don’t want to debate that here. I’ll quarantine a discussion of the display of periodic data to another blog post.

Instead I’d like to discuss a pure infographic that has no quantitative content at all. It’s a display of strategies of Rock Paper Scissors that Nathan Yau featured a couple weeks ago on his blog:

This is an attractive graphic that conveys some information–but the images have almost nothing to do with the info. It’s really a small bit of content with an attractive design that fills up space.

**Difference in perspectives**

The graphic in question is titled, “How do I win rock, paper, scissors every time?”, which is completely false. As my literal-minded colleague Kaiser Fung would patiently explain, No, the graph does no tell you how to win the game every time. This is no big deal–it’s nothing but a harmless exaggeration–but it illustrates a difference in perspective. A statistician wouldn’t be caught dead making a knowingly false statement. Conversely, a journalist wouldn’t be caught dead making a boring headline (for example, “Some strategies that might increase your odds in rock paper scissors”).

Who’s right here–the statistician or the journalist? It depends on your goals. I’ll stick with being who I am–but I also recognize that Nathan’s post got 116 comments and who knows how many thousand viewers. In contrast, my post from a few years ago (titled “How to win at rock-paper-scissors,” a bit misleading but much less so than “How to win every time”) had a lot more information and received exactly 6 comments. This is fair enough, I’m not complaining. Visuals are more popular than text, and “popular” isn’t a bad thing. The goal is to communicate, and sacrificing some information for an appealing look is a tradeoff that is often worth it.

**Moving forward**

Let me conclude with a suggestion that I’ve been making a lot lately. Lead with the pretty graph but then follow up with more information. In this case, Nathan could post the attractive image (and thus sill interest his broad readership and inspire them to those 100+ comments) but set it up so that if you click through you get text (in this case, it’s words not statistical graphs) with more detailed information:

(Sorry about the tiny font; I was having difficulty with the screen shots.)

Again I purposely chose a non-quantitative example to move the discussion away from “How’s the best way to display these data” and focus entirely on the different goals.

The post Using a “pure infographic” to explore differences between information visualization and statistical graphics appeared first on Statistical Modeling, Causal Inference, and Social Science.

**leave a comment**for the author, please follow the link and comment on his blog:

**Statistical Modeling, Causal Inference, and Social Science » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...