Review: Statistical Analysis with R: Beginner’s Guide by John M. Quick

January 1, 2011
By

(This article was first published on Robin's BlogRobin's Blog » R, and kindly contributed to R-bloggers)

Statistical Analysis with R: A beginners guideSummary: If you can get past the strange underlying story, then this gives a good introduction to R to someone with no programming experience. However, if you have any experience with other programming languages then another book is likely to be more suitable.

Reference: Quick, J. M., Statistical Analysis in R: Beginners Guide, Packt Publishing, Birmingham, UK, 300 pages. Amazon Link Sample Chapter

As a new science PhD student, I wanted to get to grips with the most powerful statistical analysis software around, and that meant trying to understand the intricacies of programming in the R Project for Statistical Computing. John M. Quick has posted extensively on the internet about programming in R, so I had high hopes for his new book, Statistical Analysis with R in the Beginner’s Guide series recently published by Packt Publishing. Unfortunately, my high hopes were not entirely fulfilled. The book provides useful and correct information about programming in R, but it is underlain by a strange story about Chinese wars and has a number of niggling problems that prevent me from fully recommending it.

Before starting, I should explain my previous experience. I have used a number of pieces of statistical software in the past (such as SPSS) and have a small amount of experience with R from a PhD statistics skills class. I do, however, have significant programming experience in a number of languages. I think this is the part of my experience that distinguishes me from the intended audience for this book, as it is designed for real beginners. Those who have any experience in a modern programming language or linux shell will find the first few chapters very easy, and therefore somewhat frustrating. However, these do give a good basic introduction to running commands in the R shell, and working out which lines are commands and which are outputs. It also explains the [1] that appears at the beginning of R output lines, which is not mentioned in many introduction to R tutorials that I have read. Once we’re past the rather contrived example of solving a magic square by using R as a calculator the interesting bit starts…

However, before describing the data analysis section of the book, I should explain the underlying story used throughout the book. The introductory chapter gives a bit of ancient Chinese history, and states that you, the reader, have been chosen to succeed the famous military leader Zhuge Liang and need to learn how to use R to analyse his data and plan the future of the military campaign. The rest of the book takes on this theme, both in the data analysis (comparing the Shu and Wei armies, and predicting battle outcomes using regression) and the general phrasing (headings like “Have a go hero!” and emphasis that if you fail the Chinese kingdom will collapse). I’ll be honest: this story doesn’t work for me at all. In fact, it drives me nuts having it constantly throughout the book. As for why it annoys me, I’m not entirely sure: probably partly because it seems like the examples have had to be twisted rather to make them fit the story, and partly because I have no interest in ancient Chinese kingdoms, or using R to plan military campaigns.

I understand that not all readers will agree with me here, and that putting a story like this behind the scary process of learning new statistics software may help people get to grips with it. From my point of view I would have preferred to see a range of datasets used from the examples provided with R (all of the datasets listed here are built in to R), as this would (a) mean that the datasets are always available from within R and (b) provide interest for a wide range of readers.

So, apart from my personal views of the underlying story, the actual content of the book is quite good. The chapters cover the whole process of statistical analysis from data import (Chapter 4), through summary statistics (Chapter 5) and modelling (Chapters 5-7) to graphical output (Chapter 10). The final chapter of the book gives good pointers for more help on R, from the inbuilt help through to recommended blogs and websites. It also covers installing packages, although more emphasis could have been made of how useful packages can be when performing analysis in R. The book assumes some statistical knowledge, but briefly explains concepts the reader may not have experienced before (such as correlation coefficients and AIC). Each chapter takes the form of some instructions (‘Time for action’), followed by an explanation (‘What just happened?’), a few questions (‘Pop quiz’) and a suggestion for the reader to try and do (‘Have a go hero’), and this approach seems to suit the material quite well. Although it can get frustrating at times (I tend to automatically skip over quizzes in books), I think the structure would help less confident readers.

The range of content that the book covers is impressive, as it goes from installing R to comparing models using AIC and customising graphs, although at times the explanations seem a bit verbose. More worryingly, the code examples, although completely correct, are written in a programming style that I suspect no real-world R programmer uses. The arguments for each command are stored as variables before the command is run (not that unusual for complex arguments, but a bit strange to do for every argument) and these variables have incredibly long names. The code snippet below (from page 172) is a good example:

&> #create a box plot that compares the number of soldiers required across the battle methods
&> #get the data formula to be used in the plot
&> boxplotAllMethodsShuSoldiersData <- battleHistory$ShuSoldiers ~ battleHistory$Method
&> #customize the plot
&> boxPlotAllMethodsShuSoldiersLabelMain <- "Number of Soldiers Required by Battle Method"
&> boxPlotAllMethodsShuSoldiersLabelX <- "Battle Method"
&> boxPlotAllMethodsShuSoldiersLabelY <- "Number of Soldiers"
&> #use boxplot(...) to create and display the box plot
&> boxplot(formula = boxplotAllMethodsShuSoldiersData, main =
      boxPlotAllMethodsShuSoldiersLabelMain, xlab = boxPlotAllMethodsShuSoldiersLabelX,
      ylab = boxPlotAllMethodsShuSoldiersLabelY)

John’s code samples on the internet (for example here) do not have this verboseness, so I assume he put it in to try and make the code samples more easily readable. The problem is that, particularly when there is no colour syntax highlighting in the book, it actually makes it far more difficult to read and understand the code.

In conclusion, this book provides a good overview of using R and is correctly pitched for an audience of beginners. The underlying story frustrates me, but this is likely to be a matter of personal taste (try looking at the sample chapter linked at the top of this post to see how you feel about the story). Apart from the verbosity of the code examples the information is accurate and up-to-date. I would recommend this book for someone who has absolutely no experience with R or other programming languages and is somewhat scared of trying to learn R, as the underlying story and structure will provide a safe and comfortable environment for learning, but for those who feel they are more confident another book may be more suitable.

(Disclaimer: I was given a free review copy of this book by PacktPub)

To leave a comment for the author, please follow the link and comment on his blog: Robin's BlogRobin's Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.