By Joseph Rickert
Yesterday, the cosmic randomizer placed me next to a newly minter lawyer in a crowed Los Gatos coffee shop. In three minutes of conversation I learned that that the fellow was interested in corporate law, was about to take a job that would give him a seat in the great VC/start-up game and that he had some understanding of statistical models: outstanding! A lawyer with a quantitative background; so, we started talking about R (I always talk about R). But seriously, in today’s data-driven decision making environment every professional: doctor, lawyer and maybe even Indian chief ought to have some statistical skills and the tools to reveal the picture that the data are painting. There is no doubt in my mind either that R is the right tool for these professionals, or that with the right introduction they can learn enough R to resurrect the basic statistical knowledge they acquired at some time in their professional training and use it to visualize a simple data set.
For the class of professionals I have in mind, and for anyone who has some background in statistics and who would like to do useful data analysis without becoming a programmer, Robert I. Kabacoff’s book, R in Action: Data Analysis and Graphics with R, is the right introduction. After making the case for R and providing some graphics for motivation, the book starts slowly with setting up the R environment and developing the necessary data handling skills. The analytic techniques presented midway through the book are mostly limited to elementary statistics, linear regression and the analysis of variance; but these are precisely the topics that professionals with engineering, bio-science and business backgrounds are most likely to be familiar. The extraordinary value in Rob’s book, however, is the emphasis on producing elegant, interpretable plots. With the clarity that has become trademark of Rob’s Quick-R website, R in Action presents multiple examples of simple, elementary R code that may be adapted to produce insightful, report-ready work. One of my favorite examples from Chapter 11 uses sample data in the car library to show the relationship between miles per gallon and weight segmented by the number of cylinders. With just a few lines of code, carefully explained in the text,
library(car) scatterplot(mpg ~ wt | cyl, data=mtcars, lwd=2, main="Scatter Plot of MPG vs. Weight by # Cylinders", xlab="Weight of Car (lbs/1000)", ylab="Miles Per Gallon", legend.plot=TRUE, id.method="identify", labels=row.names(mtcars), boxplots="xy" )
Rob leverages the default capabilities of the scatter plot function to make a plot worth looking at. This example is most effective in showing R can be harnessed for useful work with very little programming. It is not necessary to climb any steep learning curve to use R at this level. Anyone with an interest in squeezing information from data with and the technical savvy to download R should be able use Rob’s book to make R work for them.
Rob goes on to explain several sophisticated two and three dimensional sample plots, including hexagonal binning for large data sets, and correlograms that are nice to see in an elementary text.
There are now well over 100 books either about R itself, or that use R for computations and graphics. Someone not familiar with this powerful and empowering language might question the need for still another introduction. The truth is that among all of these texts there is considerable repetition and overlap. However, Rob’s book shows that there is always room for simplicity, clarity and elegance that follows from function.