## Here is how to improve your charts, graphs, maps, and…

October 2, 2010
Here is how to improve your charts, graphs, maps, and plots: Erase non-data ink. Erase redundant data ink. Maximize the ratio of data to ink. Show data variation, not design variation. The surface area of graphical elements should be directly proportio...

## Mandelbrot Set, evolved

September 30, 2010
The Mandelbrot Set is perhaps the most famous fractal of all time. It's simple in its definition: iterate the complex equation zn+1 = zn2 + c (starting with z0 = 0) for various values of c, and if doesn't go to infinity then c is part of the Mandelbrot Set. The result, however, is amazingly complex. Thinking of c...

## Example 8.5: bubble plots part 3

September 14, 2010
An anonymous commenter expressed a desire to see how one might use SAS to draw a bubble plot with bubbles in three colors, corresponding to a fourth variable in the data set. (x, y, z for bubble size, and the category variable.) In a previous entries...

## Eigenimages: The AT&T Cambridge Faces Database

September 7, 2010
I picked up the AT&T Laboratories Cambridge database of faces for a clustering application. The database consists of images of 40 distinct subjects, each in 10 different facial positions and expressions. Typically, the goal of clustering in these data is to recover the ‘true’ partition, or that which isolates images of distinct subjects. Each image

## Barchart or Dotchart?

September 7, 2010
Which of the following two charts (both created with R) to you prefer? This dotchart: Or this bar chart? Andrew Gelman (who, incidentally, is speaking at the October NYC UseR meeting) prefers the dotchart prefers a line plot (update: see Gelman's comment, below), but personally I think the bar chart is more easily interpreted. What do you think? You...

## Competition: Data Visualization with ggplot2

September 3, 2010
The ggplot2 package for R is an amazing system for creating entirely new visualizations of data. It allows data analysts to tell a detailed, meaningful and yet easy-to-interpret story about complex and/or unusual data sets. To promote more data stories being told, ggplot2 author Hadley Wickham has organized a ggplot2 case study competition. Simply create a new visualization of...

## Example 8.3: pyramid plots

August 30, 2010
Pyramid plots are a common way to display the distribution of age groups in a human population. The percentages of people within a given age category are arranged in a barplot, often back to back. Such displays can be used distinguish males vs. femal...

## How extreme is the Russian heatwave?

August 20, 2010
The devastating heatwave in Russia now seems to be over, but not before killing thousands, causing extensive wildfires, and destroying crops. But how severe was this heatwave, compared to past summers? Physicist and climate scientist Joe Wheatley looks at the record of temperature and rainfall in Russia over the last 60 years and places the last 3 months in...

## Baseball games: getting longer?

August 11, 2010
ESPN's Bill Simmons (aka The Sports Guy) recently suggested that the primary cause of dwindling interest in Red Sox games by fans is that baseball games these days are too long. "It's not that fun to spend 30-45 minutes driving to a game, paying for parking, parking, waiting in line to get in, finding your seat ... and then,...

## R unfolds the history of the Afghanistan war

August 9, 2010
Drew Conway continues his analysis of the Wikileaks data. Having concluded that the data appear legitimate (except perhaps in one region, based on a Benford's Law analysis of the numbers in the documents), Drew follows up with a spatio-temporal analysis of activity within Afghanistan, based on the datelines of the documents themselves (click to enlarge): Each panel represents a...