by Joseph Rickert
There have been well over a hundred books on R published within the last ten years. Most of these texts with titles like “Introduction Statistics with R” or “Time Series with R” offer the reader a way to jump right in and perform some concrete statistical analysis using R’s myriad built-in functions and extensive visualization features. And, while it is true that some R books appear to be little more than a rehash of basic documentation, there are nevertheless scores of carefully written texts from experts that not only illuminate some area of statistics but also demonstrate some good R programming as well. In no small way, I believe these works have contributed to the R’s popularity and growth by providing quality application level documentation.
Comparatively few books, however, are focused on teaching R programming itself. So it was a pleasant surprise when a copy of Garrett Grolemund’s “Hands-On Programming with R: Write Your Own Functions and Simulations” (O’Reilly 2015) came my way. This is a superb book: well conceived, unusual in the choice of material and sufficiently streamlined (185 pages not including the appendices) to make it a non-stop beginning-to-end read.
At the very beginning Garrett says:
I want to help you become a data scientist, as well as a computer scientist, so this book will focus on programming skills that are most related to data science.
These skills have to do with solving what Garrett refers to as the “logistical problems” of data science. In the context of the R language, they include acquiring data, manipulating R objects, constructing custom functions, negotiating the R environment and above all, writing vectorized code.
Given the ambitious agenda, “Hands-On Programming with R” starts surprisingly slowly with arithmetic, assignment, useful R functions and basic housekeeping chores: getting help and looking for packages. Then, still slowly and deliberately the text discusses R objects, atomic vectors, data types and data structures. 48 pages in and Garrett is still lingering on attributes. But this discussion is more sophisticated than most authors attempt. The presentation of type, attributes and class, in particular the insight that the concept of class follows directly from attributes, is meant to cultivate a programmer's mindset.
Around page 65 when Garrett gets excited about subsetting the pace really picks up. If you are hooked and still reading like I was by page 112 you will have acquired a working knowledge of scoping rules and environments and be ready for the beguilingly lucid discussion of the S3 class system that begins on page 139. Even if you are an experienced R programmer you may want to borrow a copy of the book and read this. If you really know your stuff, you may not learn anything new, but I bet you will be hard pressed to do a better job of explaining S3 classes to someone else.
After S3 the text moves to considering loops as a prelude to its presentation of vectorized code. This section, which is really the final destination of the book, is exceptionally well done. First, vectorized code is characterized as code that takes advantage of three great features of the R language: fast logical tests, powerful subsetting operations and a multitude of built-in functions that permit element-wise execution. Then the text demonstrates how to put these ideas into practice.
As you can gather, I was impressed by the conceptual formulation of the material. However, the real strength of the text is its sharp presentation of essential elements of the R language through a well-crafted, extended example that forms the spine of the book. “Hands-On Programming with R” is indeed a “hands-on” text that guides and challenges the reader to write good R code. A reader / coder who makes it to the end will have worked through several refinements of a small collection of functions that implement a fairly complex slot machine simulation. This example significantly raises the bar for selecting code examples in any R book. The simulation is rich enough to illustrate all of the R features presented in the text while allowing for refinement and polishing as the final form of the slot machine takes shape. The whole presentation is very tight. Garrett tells a pretty good story. During the final vectorized-code chapter I found myself reading with the delight of anticipation: “Just how is he going to make this code better?”.
I should also mention that the book is notable for what it does not include. This might be the first R book I have encountered that doesn’t develop any statistical models. Not a single regression is fit and there are no plots to speak of (3 histograms and a scatter plot). Certainly, this is the only R book I have come across that mentions data science in the preface that is not replete with Random Forest models and the like. Presumably, all of this will show up in the follow up book that Garrett promises in the preface.
“Hands-On Programming with R” presents but one carefully thought-through trajectory of many possible R language excursions. It is not to be compared with Hadley Wickham’s encyclopedic “Advanced R” and it contains only a fraction of the material you can find in Norm Matloff’s “The Art of R Programming“. But, having worked through “Hands-On Programming with R” both of these texts should be accessible.
Garrett's book is a good read: a technical story with a plot and a few surprises that could help anyone starting out with the R language learn to write some pretty slick code.