We recently had a dilemma for an OSI publication about the design for the graphs. There will be dozens of these graphs showing the mean score on a given variable for nearly 11000 parents from 10 countries. This example is for household wealth which has values ranging from 0 to 16. These are the three alternative designs we considered, all constructed with the wonderful ggplot2.
My personal favourite is the first as all of the 10 thousand persons in the database is represented by a dot. No information is lost. The means are shown by larger dots.
The second option was preferred by many because it looks more familiar. However I had to disallow it because although they look like boxplots, actually the centre line is the mean and the height of the box is two standard deviations, whereas for a boxplot that should be the median and the interquartile range.
So we settled on the third option though I had to tinker a bit with the code because some of the standard deviations actually exceed the range of the y-axis – the kind of problem you wouldn’t have with the first option.