Practical statistics books for software engineers

[This article was first published on The Shape of Code » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

So you have read my (draft) book on evidence-based software engineering and want to learn more about the statistical techniques used, but are not interested lots of detailed mathematics. What books do I suggest?

All the following books are sitting on the shelf next to where I write (not that they get read that much these days).

Before I took the training wheels off my R usage, my general go to book was (I still look at it from time to time): “The R Book” by Crawley, second edition; “R in Action” by Kabacoff is a good general read.

In alphabetical subject order:

Categorical data: “Categorical Data Analysis” by Agresti, the third edition is a weighty tomb (in content and heaviness). Plenty of maths+example; more of a reference.

Compositional data: “Analyzing compositional data with R” by van den Boogaart and Tolosana-Delgado, is more or less the only book of its kind. Thankfully, it is quite good.

Count data: “Modeling count data” by Hilbe, may be more than you want to know about count data. Readable.

Circular data: “Circular statistics in R” by Pewsey, Neuhauser and Ruxton, is the only non-pure theory book available. The material seems to be there, but is brief.

Experiments: “Design and analysis of experiments” by Montgomery.

General: “Applied linear statistical models” by Kutner, Nachtsheim, Neter and Li, covers a wide range of topics (including experiments) using a basic level of mathematics.

Mixed-effects models: “Mixed-effects models in S and S-plus” by Pinheiro and Bates, is probably the book I prefer; “Mixed effects models and extensions in ecology with R” by Zuur, Ieno, Walker, Saveliev and Smith, is another view on an involved topic (plus lots of ecological examples).

Modeling: “Statistical rethinking” by McElreath, is full of interesting modeling ideas, using R and Stan. I wish I had some data to try out some of these ideas.

Regression analysis: “Applied Regression Analysis and Generalized Linear Models” by Fox, now in its third edition (I also have the second edition). I found this the most useful book, of those available, for a more detailed discussion of regression analysis. Some people like “Regression modeling strategies” by Harrell, but this does not appeal to me.

Survival analysis: “Introducing survival and event history analysis” by Mills, is a readable introduction covering everything; “Survival analysis” by Kleinbaum and Klein, is full of insights but more of a book to dip into.

Time series: The two ok books are: “Time series analysis and its application: with R examples” by Shumway and Stoffler, contains more theory, while “Time series analysis: with applications in R” by Cryer and Chan, contains more R code.

There are lots of other R/statistics books on my shelves (just found out I have 31 of Springer’s R books), some ok, some not so. I have a few ‘programming in R’ style books; if you are a software developer, R the language is trivial to learn (its library is another matter).

Suggestions for books covering topics I have missed welcome, or your own preferences (as a software developer).

To leave a comment for the author, please follow the link and comment on their blog: The Shape of Code » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)