How I build up a ggplot2 figure

[This article was first published on Rbloggers – A HopStat and Jump Away, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently, Jeff Leek at Simply Statistics discussed why he does not use ggplot2. He notes “The bottom line is for production graphics, any system requires work.” and describes a default plot that needs some work:

library(ggplot2)
ggplot(data = quakes, aes(x = lat,y = long,colour = stations)) + geom_point()

plot of chunk plot

To break down what is going on, here is what R interprets (more or less):

  1. Make a container for data ggplot.
  2. Use the quakes data.frame: data = quakes.
  3. Map certain “aesthetics” with the aes to three different aesthetics (x, y, z) to certain variables from the dataset lat, long, stations, respectively.
  4. Add a layer of geometric things, in this case points (geom_point).

Implicitly, ggplot2 notes that all 3 aesthetics are continuous, so maps them onto the plot using a “continuous” scale (color bar). If stations were a factor or character column, the plot would not have a color bar but a “discrete” scale.

Now, Jeff goes on to describe elemnts he believes required to make this plot “production ready”:

  1. make the axes bigger
  2. make the labels bigger
  3. make the labels be full names (latitude and longitude, ideally with units when variables need them
  4. make the legend title be number of stations reporting

As such, I wanted to go through each step and show how you can do each of these operations

Make the Axes/Labels Bigger

First off, let's assign this plot to an object, called g:

g = ggplot(data = quakes, 
           aes(x = lat,y = long,colour = stations)) + 
  geom_point()

Now, you can simply call print(g) to show the plot, but the assignment will not do that by default. If you simply call g, it will print/show the object (as other R objects do), and plot the graph.

Theme – get to know it

One of the most useful ggplot2 functions is theme. Read the documentation (?theme). There is a slew of options, but we will use a few of them for this and expand on them in the next sections.

Setting a global text size

We can use the text argument to change ALL the text sizes to a value. Now this is where users who have never used ggplot2 may be a bit confused. The text argument (input) in the theme command requires that text be an object of class element_text. If you look at the theme help it says “all text elements (element_text)”. This means you can't just say text = 5, you must specify text = element_text().

As text can have multiple properties (size, color, etc.), element_text can take multiple arguments for these properties. One of these arguments is size:

g + theme(text = element_text(size = 20))

plot of chunk bigger_axis

Again, note that the text argument/property of theme changes all the text sizes. Let's say we want to change the axis tick text (axis.text), legend header/title (legend.title), legend key text (legend.text), and axis label text (axis.title) to each a different size:

gbig = g + theme(axis.text = element_text(size = 18),
                 axis.title = element_text(size = 20),
                 legend.text = element_text(size = 15),
                 legend.title = element_text(size = 15))
gbig

plot of chunk bigger_axis2

Now, we still have the plot g stored, but we make a new version of the graph, called gbig.

Make the Labels to be full names

To change the x or y labels, you can just use the xlab/ylab functions:

gbig = gbig + xlab("Latitude") + ylab("Longitude")
gbig

plot of chunk lab_full

We want to keep these labels, so we overwrote gbig.

Maybe add a title

Now, one may assume there is a main() function from ggplot2 to give the title of the graph, but that function is ggtitle(). Note, there is a title command in base R, so this was not overwritten. It can be used by just adding this layer:

gbig + ggtitle("Spatial Distribution of Stations")

plot of chunk title

Note, the title is smaller than the specified axes label sizes by default. Again if we wanted to make that title bigger, we can change that using theme:

gbig +
  ggtitle("Spatial Distribution of Stations") + 
  theme(title = element_text(size = 30))

plot of chunk big_title

I will not reassign this to a new graph as in some figures for publications, you make the title in the figure legend and not the graph itself.

Making a better legend

Now let's change the header/title of the legend to be number of stations. We can do this using the guides function:

gbigleg_orig = gbig + guides(colour = guide_colorbar(title = "Number of Stations Reporting"))
gbigleg_orig

plot of chunk leg

Here, guides takes arguments that are the same as the aesthetics from before in aes. Also note, that color and colour are aliased so that you can spell it either way you want.

I like the size of the title, but I don't like how wide it is. We can put line breaks in there as well:

gbigleg = gbig + guides(colour = guide_colorbar(title = "NumbernofnStationsnReporting"))
gbigleg

plot of chunk leg2

Ugh, let's also adjust the horizontal justification, so the title is centered:

gbigleg = gbigleg + 
  guides(colour = guide_colorbar(title = "NumbernofnStationsnReporting",
                                 title.hjust = 0.5))
gbigleg

plot of chunk leg_adjust

That looks better for the legend, but we still have a lot of wasted space.

Legend IN the plot

One of the things I believe is that the legend should be inside the plot. In order to do this, we can use the legend.position from the themes:

gbigleg + 
  theme(legend.position = c(0.3, 0.35))

plot of chunk leg_inside

Now, there seems can be a few problems here:

  1. There may not be enough place to put the legend
  2. The legend may mask out points/data

For problem 1., we can either 1) make the y-axis bigger or the legend smaller or a combination of both. In this case, we do not have to change the axes, but you can use ylim to change the y-axis limits:

gbigleg +
  theme(legend.position = c(0.3, 0.35)) +
  ylim(c(160, max(quakes$long)))

plot of chunk change_ylim

I try to not do this as area has been added with no data information. We have enough space, but let's make the legend “transparent” so we can at least see if any points are masked out and to make the legend look a more inclusive part of the plot.

Making a transparent legend

I have a helper “function” transparent_legend that will make the box around the legend (legend.background) transparent and the boxes around the keys (legend.key) transparent as well. Like text before, we have to specify boxes/backgrounds as an element type, but these are rectangles (element_rect) compared to text (element_text).

transparent_legend =  theme(
  legend.background = element_rect(fill = "transparent"),
  legend.key = element_rect(fill = "transparent", 
                            color = "transparent")
)

One nice thing is that we can save this as an object and simply “add” it to any plot we want a transparent legend. Let's add this to what we had and see the result:

gtrans_leg = gbigleg + 
  theme(legend.position = c(0.3, 0.35)) +
  transparent_legend
gtrans_leg

plot of chunk leg_inside2

Moving the title of the legend

Now, everything in gtrans_leg looks acceptable (to me) except for the legend title. We can move the title of the legend to the left hand side:

gtrans_leg + guides(colour = guide_colorbar(title.position = "left"))

plot of chunk leg_left

Damnit! Note, that if you respecify the guides, you must make sure you do it all in one shot (easiest way):

gtrans_leg + guides(
  colour = guide_colorbar(title = "NumbernofnStationsnReporting",
                          title.hjust = 0.5,
                          title.position = "left"))

plot of chunk leg_left_correct

A little more advanced

The last statement is not entirely true, as we could dig into the ggplot2 object and assign a different title.position property to the object after the fact.

gtrans_leg$guides$colour$title.position = "left"
gtrans_leg

plot of chunk respec

“I don't like that theme”

Many times, I have heard people who like the grammar of ggplot2 but not the specified theme that is default. The ggthemes package has some good extensions of theme from ggplot2, but there are also a bunch of themes included in ggplot2, which should be specified before changing specific elements of theme as done above:

g + theme_bw()

plot of chunk themes

g + theme_dark()

plot of chunk themes

g + theme_minimal()

plot of chunk themes

g + theme_classic()

plot of chunk themes

Conclusions

I agree that ggplot2 can deceive new users by making graphs that look “good”-ish. This may be a detriment as they may believe they are good enough, when they truly need to be changed. The changes are available in base or ggplot2 and the overall goal was to show how the recommendations can be achieved using ggplot2 commands.

Below, I discuss some other aspects of the post, where you can use ggplot2 to make quick-ish exploratory plots. I believe, however, that ggplot2 is not the fastest for quick basic exploratory plots. What is is better than base graphics is for making slightly more complex exploratory plots that are necessary for analysis, where base can take more code to do.

How to make quick exploratory plots

I agree with Jeff that the purpose of exploratory plots should be done quickly and a broad range of plots done with minimal code.

Now, I agree that plot is a great function. I do believe that you can create many quick plots using ggplot2 and can be faster than base in some instances. A specific case would be that you have a binary y variable and multiple continous x variables. Let's say I want to plot jittered points, a fit from a binomial glm (logistic regression), and one from a loess.

Here we will use mtcars and say if the car is American or foreign (am variable) is our outcome.

g = ggplot(aes(y = am), data = mtcars) + 
  geom_point(position = position_jitter(height = 0.2)) + 
  geom_smooth(method = "glm", 
              method.args = list(family = "binomial"), se = FALSE) +
  geom_smooth(method = "loess", se = FALSE, col = "red")

Then we can simply add the x variables as aesthetics to look at each of these:

g + aes(x = mpg)

plot of chunk unnamed-chunk-2

g + aes(x = drat)

plot of chunk unnamed-chunk-2

g + aes(x = qsec)

plot of chunk unnamed-chunk-2

Yes, you can create a function to do the operations above in base, but that's 2 sides of the same coin: function versus ggplot2 object.


To leave a comment for the author, please follow the link and comment on their blog: Rbloggers – A HopStat and Jump Away.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)