This is a gem of a book.
From the introduction:
We intend this work to be a practitioner’s guide to the predictive modeling process and a place where one can come to learn about the approach and to gain intuition about the many commonly used and modern, powerful models.
…it was our goal to be as hands-on as possible, enabling the readers to reproduce the results within reasonable precision as well as being able to naturally extend the predictive modeling approach to their own data.
The book is structured into four main sections. First is General Strategies, which provides an introduction and discusses things like pre-processing and tuning.
The next two sections cover regression and classification, each with chapters on linear and non-linear methods, as well as tree and rule based methods, with one to two chapters on practical issues such as measuring performance.
The final section covers feature selection, predictor importance and a discussion around model performance.
There are a few things I really like:
It is not an academic or mathematical treatise; the emphasis is on practice, discussing the issues that commonly arise and how they can be approached. Plenty of references are provided for those wanting to dig deeper.
Every example has its data set and code available so one can work through the examples as presented. In most cases they are real world datasets and there is great discussion of the real world issues that arise, what should be considered and the various tradeoffs that can be made.
Discussion and code are separate. Aside from the excellent content, this is probably what I appreciate the most. Each chapter presents its content, with charts where appropriate, while the actual walk through of the code and raw output is in a separate section of the chapter.
This makes it much easier to focus on the material being presented. It is always difficult to present source code along with discussion. This is not a book about programming per-se, it is about using existing tools to make intelligent and reasoned decisions about the task at hand. It makes a lot of sense to have the code presented separately.
Also, as far as I have read, each chart is at most only one page away from the text discussing it. This is a small thing but I feel there has been serious consideration about the presentation of the material and it has been done very well.
It is not a book about caret, the package of author Max Kuhn. To be honest I would be pretty happy even if it were about caret, which certainly does get some use in the code, but it is relatively package agnostic.
This is a great book, providing both the trees and the forest so to speak. I am unaware of any other book with similar content, and I wish I had something like this when I was first getting interested in machine learning.
There are books that are very introductory, books that cover the details of the algorithms, and books that provide rigorous coverage of the theory, but these are not really accessible to those without a serious amount of mathematics. There are a few equations presented where appropriate, but it is certainly not the focus of the book.
There are no real shortcomings, though if there were ever a second edition, coverage of time series methods and deep learning would be welcome. I appreciate they are both book worthy topics by themselves, and the latter is still very much a moving target.
In summary: Great content, well written and well presented. This book would be my top recommendation to anyone looking to get started or working with predictive modeling. Well worth checking out.
To leave a comment
for the author, please follow the link and comment on their blog: Shifting sands
offers daily e-mail updates
news and tutorials
on topics such as: Data science
, Big Data, R jobs
, visualization (ggplot2
), programming (RStudio
, Web Scraping
) statistics (regression
, time series
) and more...
If you got this far, why not subscribe for updates
from the site? Choose your flavor: e-mail
, or facebook