A common criticism of R, especially from data scientists who are new to R but proficient in multiple programming languages, is that R is “quirky” and annoying because there is almost always more than one way to do simple things. I usually counter that they are trying to say that R is “flexible” and “rich”, but by the time we get around to talking about plots I have to concede that my data scientist friends have a point. There are, after all, three completely different major plotting systems in R: the base graphics system, lattice plots and the ggplot2. Each of these systems comprises its own little universe of functions, form, feel and things you just have to remember (or look up every time). They all do the basic plots really well. So, when you come to a new plot that might be a little way out of the ordinary for you what is the best way to go? Now, I know. In the preface to his recent source book for producing plots in R, the R Graphics Cookbook Winston Chang writes. “Each recipe in this book lists a problem and a solution. In most cases, the solutions I offer aren’t the only way to do things in R, but they are, in my opinion, the best way.” – no mincing words here. I only got to page one and I knew I was going to like this book!
All right, so Winston Chang is an accomplished ggplot2 developer , and who am I to argue whether the solutions he picked are the “best” solutions, but how good is the book itself? Let’s begin with the appendix. (When reading a technical book, I almost always jump from the beginning to somewhere near the end to get an idea of how much work I am signing up for.) Appendix A is a succinct, lucid and well-written introduction to ggplot2. In my opinion, it provides sufficient information, presented with exemplary clarity, to enable someone unfamiliar with ggplot2 to work through the rest of the book. Since a good many of Winston’s examples are based on ggplot2 I recommend that anyone who is new to this package read appendix first.
The R Graphics Cookbook is mostly written in what might be called “web page style”: short sentences, bullet points, code snippets and plots integrated in a manner that, I think, facilitates comprehension. The book is organized into 15 chapters, eleven of which are either concerned with a particular kind of plot (e.g. Chapter5: Scatter Plots) or some potentially vexing task that is often required to create a satisfying plot (e.g. Chapter 7: Annotations). The remaining chapters are devoted to supporting material such as Chapter 1: “R Basics” which is a minimalist account on getting information into R and Chapter 15 “Getting Your Data into Shape” which is a short, but fairly comprehensive account of transforming data into the “long form” used by ggplot(). Each chapter is organized as a series of “problems” with sections entitled “Problem”, “Solution” and “Discussion”. The Solution section contains the recommended code while the Discussion section provides some elaboration. In simple cases, the discussion may be just a few bullet points. At other times, the Discussion anticipates difficulties the reader may have. For example, the Discussion section “Making a Basic Bar Graph” introduces factors.
The following plot Download Stacked Plot from the discussion section of making a stacked graph illustrates is typical of the many simple examples presented in the book. It is a variation on the basic theme that shows the sophisticated look and feel that is the hallmark of ggplot2.
This next plot Download Vector field which shows a vector field superimposed on a map is one of the more elaborate examples presented in the book. It gives an idea of the data manipulation that is often required to make a useful plot.
The R Graphics Cookbook contains a few surprises too. The Note on the last page of the appendix reads:
“Some introductions to ggplot2 make use of a function called qplot(), which is intended as a convenient interface for making graphs. It does require a little less typing than using ggplot() plus a geom, but I’ve found it a bit confusing to use because it has a slightly different way of specifying certain graphing parameters. I think it’s simpler and easier to just use ggplot().”
I was pleasantly surprised to read this ( I have always been confused by qplot() ), but then I was surprised again to see that the second plot in the book, and several others, are done with qplot()! Clearly, there is more going on with these plots than meets the eye. Then, bang!, another surprise : In the Discussion section of “Making A Scatter Plot Matrix” Winston writes: “We didn’t use ggplot2 here because it doesn’t make scatter plot matrices (at least, not well). “ Damn! For two years now, I have been trying to make ggplot2 produce scatter plot matrices that color points by group . (I bet I am not the only one.) The problem is you start out doing a small project and decide to make everything look really nice with ggplot2 then, the thought pops in your head to do a plot matrix. Both the base R graphics system plot(iris[,1:4],col=iris$Species) and lattice splom(iris[,1:4],col=iris$Species)do this easily, but there is plotmatrix() lurking in in the depths of ggplot2 and mocking my aesthetic efforts. Personal problems aside, the point of all this is that the R Graphics Cookbook is not just a manual for ggplot2(). It is something much more valuable, a guide to making good plots in R written by a knowledgeable developer with strong feelings, a sense of style, and a gift for exposition. The book obviously doesn’t cover everything, and it would have been nice to see some lattice examples. The R Graphics Cookbook is, however, a very nice example of the kind of systematic, task orientated documentation that we need more of for R. I highly recommend it, especially for people who are new to R. I might even buy a couple of copies for my multi-language, “need the one best way to do things” friends.