We recently had a dilemma for an OSI publication about the design for the graphs. There will be dozens of these graphs showing the mean score on a given variable for nearly 11000 parents from 10 countries. This example is for household wealth which has values ranging from 0 to 16. These are the three alternative designs we considered, all constructed with the wonderful ggplot2.

My personal favourite is the first as all of the 10 thousand persons in the database is represented by a dot. No information is lost. The means are shown by larger dots.

The second option was preferred by many because it looks more familiar. However I had to disallow it because although they look like boxplots, actually the centre line is the mean and the height of the box is two standard deviations, whereas for a boxplot that should be the median and the interquartile range.

So we settled on the third option though I had to tinker a bit with the code because some of the standard deviations actually exceed the range of the y-axis – the kind of problem you wouldn’t have with the first option.

Permalink

| Leave a comment »

*Related*

To

**leave a comment** for the author, please follow the link and comment on their blog:

** Social data blog**.

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as:

Data science,

Big Data, R jobs, visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...