Boxplots or raw data graphs?

[This article was first published on Social data blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We recently had a dilemma for an OSI publication about the design for the graphs. There will be dozens of these graphs showing the mean score on a given variable for nearly 11000 parents from 10 countries. This example is for household wealth which has values ranging from 0 to 16. These are the three alternative designs we considered, all constructed with the wonderful ggplot2.

My personal favourite is the first as all of the 10 thousand persons in the database is represented by a dot. No information is lost. The means are shown by larger dots.

The second option was preferred by many because it looks more familiar. However I had to disallow it because although they look like boxplots, actually the centre line is the mean and the height of the box is two standard deviations, whereas for a boxplot that should be the median and the interquartile range.

So we settled on the third option though I had to tinker a bit with the code because some of the standard deviations actually exceed the range of the y-axis – the kind of problem you wouldn’t have with the first option.

Permalink | Leave a comment  »

To leave a comment for the author, please follow the link and comment on their blog: Social data blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)