Making Data Visually Appealing

[This article was first published on Climate Change Ecology » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve recently been considering the graphical presentation of data. I get the feeling that we, ecologists/scientsits, could be better at data presentation. Graphs must be informative, but they don’t have to be ugly. I think that making visually appealing charts and graphs goes a long way towards making science accessible. So my posts will concentrate on making what I consider to be informative AND visually appealing graphs. That means NO bar graphs…

This first blog post will describe my recent foray into a new graphics package in R, ggplot2. Actually, ggplot2 isn’t new at all, but it’s new to me. I’ve been incredibly happy with R’s base graphics and the lattice extensions. Considering that ggplot2 required learning a whole new syntax, I never really saw the advantage. However, I’ve recently begun playing with it because I’m interested in making rank clocks (see Collins et al. 2008 in Ecology). The reason I like ggplot is that it makes very appealing graphs incredibly easily. As I’ve begun messing with it, I’ve discovered ggplot is fairly simple to get a hold of to make basic plots. Of course, like anything in R, making more complicated plots gets more, well, complicated quite quickly. I’ve been reading through the book on ggplot by Wickham. It’s very comprehensive, but also quite dense. So over the next few posts (hopefully), I’ll describe my foray into ggplot2 and what I’ve learned.

We’ll concentrate on two chart types: scatterplots (continuous x and continuous y) and boxplots (categorical x and continuous y). I like boxplots for a variety a reasons, including that fact that they are more informative, less wasteful, and prettier than bar charts, which are (in my opinion) as uninformative a graph as you can make. Don’t get me wrong, bar charts have uses, but they are limited.

SCATTERPLOT

Let’s simulate some data. We’ll make a dataframe. I make a dataframe for two main reasons: 1) Most of the time when you’re working with data, it’ll be in a

category <- rep(c('A','B','C'), each=10)
x <- runif(30, 1, 15)
y <- 5 + 2*x + rnorm(30, 0, 2)
plotData <- data.frame(category, x, y)

Here’s the basic R scatterplot with a few adjustments.

par(mar=c(4,4,1,1)+0.2)
plot(x,y,pch=16)

Rscatterbasic

This is about as basic as it gets in R. Not bad. Worthy of publication in a journal with a few more tweaks. But this isn’t going to catch any eyes. The ggplot2 scatterplot, on the other hand, will. First, let’s get a hold of the syntax. The first thing you do is make a ggplot object (I’m going to ignore the quick plot, qplot( ), command because it’s better to learn the full syntax).

p <- ggplot(plotData, aes(x,y))

This just says: “Make a ggplot object. The data is stored in plotData, variable ‘x’ is on the x-axis, and variable ‘y’ is on the y-axis’. That’s it. It has not actually specific what KIND of plot. ggplot has several kinds of plots, called geometries, or geoms, including bar (geom_bar), scatterplot (geom_point), box (geom_boxplot), and others. The intuitive thing about ggplot is that is operates in layers. Next we can say ‘take the ggplot object and put a scatterplot layer on it’.

 p + geom_point()
 

ggplot1

Already that’s better looking. We can adjust the size of the points:

 p + geom_point(size=3)
 

There are a number of other variables to adjust, but I’ll leave that to you.

Here’s where ggplot gets better. It’s very easy to group the points. In R, it takes some work, you have to specify colors manually based on the grouping factor, etc. In lattice, it’s easier, but it takes a lot of work to get the plot visually pleasing (the default colors in lattice are.. interesting).

In ggplot, all we have to do is add in a ‘color’ argument.

 p + geom_point(aes(color=category), size=3)
 

ggplotScatterColor

The color argument is put in the aes() argument because you’re updating the aesthetic of the graph. That’s a much nicer plot.

Adding in a linear regression line is as simple as adding in another geom:

 p + geom_point(aes(color=category), size=3) + geom_smooth(method='lm')
 

ggplotScatterLine

With only a tad bit of effort we have a good looking scatterplot! Granted, it could use a little dusting up, but you can see the difference between the base graphic’s and ggplot2′s defulat graphs.

BOXPLOT

Here’s R’s default boxplot:

 boxplot(y ~ category)
 

rboxplot

Again, it’s not terrible. Perfectly worthy of publication. But it’s not going to get the attention of passer-bys. Let’s do the same thing in ggplot. We’ll make a new object with ‘category’ on the x-axis and add a boxplot geom on top of it.

 p2 <- ggplot(plotData, aes(category, y))
 p2 + geom_boxplot()
 

ggplotbox

Much better! We can again add colors. Note that adding colors automatically introduces a legend, which is redundant in this case, so we tack on an extra command to get rid of it.

 p2 + geom_boxplot(aes(color=category)) + theme(legend.position='none')
 

ggplotOutline

I like it. Let’s switch it up and fill the boxplot with color instead of outlining it. The automatic fill is a little dark, so I’ll introduce some transparency with the ‘alpha’ command.

 p2 + geom_boxplot(aes(fill=category), alpha=I(0.5)) + theme(legend.position='none')
 

ggplotBoxFill

Now here is a boxplot that is super informative. It tells us the distribution of the data within each category (compare that to a bar chart). We know the high and low values and any outliers. It’s attractive, with some nice colors, so that it’ll grab the interest of people scanning through. Granted, it would still require work to become ready for publication, but it’s a very good start.

As a final note, the thing I think I find most appealing about ggplot is that it works in layers and geoms can be added on top of one another. They can be combined in any way you see fit. Layers are added in they order they are called. For example, we can make a boxplot overlain with a scatterplot (not sure why you’d do this in actuality, but it’s a good example)


p2 + geom_boxplot(aes(fill=category), alpha=I(0.5)) +
 geom_point(aes(color=category), size=3) +
 theme(legend.position='none')

geomBoxColScatt

I’ll hopefully keep posting on making graphs visually appealing. And I’ll most likely be making extensive use of ggplot2 to do it.

I’m also thinking of starting up a call for your best and most artistic graphs. Visually appealing science. I’d like to get some posts of graphs that people have made in their own research that can be considered blending art and science to grab the attention of non-scientists. I haven’t figured out the format just yet, but I’m interested in your thoughts.


To leave a comment for the author, please follow the link and comment on their blog: Climate Change Ecology » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)