I recently came across a data visualization that perfectly demonstrates the difference between the “infovis” and “statgraphics” perspectives.
Here’s the image (link from Tyler Cowen):
That’s the infovis. The statgraphic version would simply be a dotplot, something like this:
(I purposely used the default settings in R with only minor modifications here to demonstrate what happens if you just want to plot the data with minimal effort.)
Let’s compare the two graphs:
From a statistical graphics perspective, the second graph dominates. The countries are directly comparable and the numbers are indicated by positions rather than area. The first graph is full of distracting color and gives the misleading visual impression that the total GDP of countries 5-10 is about equal to that of countries 1-4.
If the goal is to get attention, though, it’s another story. There’s nothing special about the top graph above except how it looks. It represents neither a data-gathering effort, nor a statistical analysis, nor even a clever juxtaposition (as in the famous graph of health costs and life expectancies). If someone had posted the second graph above (the lineplot), I doubt it would’ve been sent around the web, and I doubt that Cowen would’ve noticed it in the first place.
Thus, in this modern world of multichannel communications, chartjunk does have a purpose: it gets you noticed.
P.S. Here’s my R code:
png ("africagdp.png", height=350, width=400)
countries <- c ("South Africa", "Egypt", "Nigeria", "Algeria",
"Morocco", "Angola", "Libya", "Tunisia", "Kenya", "Ethiopia",
"Ghana", "Cameroon")
gdp <- c (285.4, 188.4, 173, 140.6, 91.4, 75.5, 62.3,
39.6, 29.4, 28.5, 26.2, 22.2)
dotchart (rev(gdp), rev(countries),
xlab="GDP in billions of US dollars",
main="African Countries by GDP",
xlim=max(gdp)*c(.038,1.02), pch=20)
dev.off ()
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...



Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).