**R – Win-Vector Blog**, and kindly contributed to R-bloggers)

*Practical Data Science with R*, Zumel, Mount; Manning 2014 is a book Nina Zumel and I are very proud of.

I have written before how I think this book stands out and why you should consider studying from it.

Please read on for a some additional comments on the intent of different sections of the book.With *Practical Data Science with R* we wanted to help new data scientists and analysts get their bearings. We wanted to help them know what was expected of them and some tools and techniques that would help them in their tasks. We are trying to teach through “data scientists’ block” or “analysts’ blank page syndrome.” We chose R because it is an excellent analysis platform, and sufficiently self-contained that one can work on any step of the data science process without already being a mystical data science unicorn. It is a book trying to teach you what to do, with examples of it being done.

We worked very hard on each chapter, some of which represented opportunities to re-do things we had already written on with the benefits of editors. Also it was a chance to not always be lost in the technical details. Some of the chapters take special advantage of that. I’d like to call out these particular chapters.

The core of the book includes:

Chapter 1 The data science processThis chapter tells you a lot about the nature of the work. Not a lot of books cover this (one notable exception being

Doing Data Science: Straight Talk from the FrontlineO’Neil, Schutt; O’Reilly 2014). A lot of analyst tasks are being taken over as “data science tasks” so necessarily a lot of people will have to be recognized as data scientists. It makes sense to see some description of the roles and expectations to see if the job (not just the job title) appeals to you.

Chapter 3 Exploring dataChapter 4 Managing dataChapter 5 Choosing and evaluating modelsChapter 6 Memorization methodsThis sequence of chapters form the heart of the book. It starts with data and moves through the concept of modeling. Discussion of particular statistical and machine learning methods (such as linear regression, logistic regression, random forests, and support vector machines) are held off until after this core sequence.

We spend a lot of time on the neglected topic of data preparation because there are

manymore opportunities for model performance improvement at the “intake end” (variables) than at the “outtake end” (re-processing modeling results). Some of the ideas from this sequence have since been further refined (and documented) in our open source vtreat package.

Chapter 10 Documentation and deploymentChapter 11 Producing effective presentationsThese chapters are the epilogue of the book, they emphasize how to collaborate with others.

The remaining chapters are the nuts and bolts:

Chapter 2 Loading data into RChapter 7 Linear and logistic regressionChapter 8 Unsupervised methodsChapter 9 Exploring advanced methodsThese chapters concentrate on how tools that allow you to pursue the goals and tasks of the other chapters actually work. For instance an unstated goal of Chapter 7 was to be able to read almost every scrap of summary that R reports for

`lm`

and`glm`

models. We even included how to calculate the (oddly missing) overall model significance for`glm`

(a feature now supplied in our sigr package). Every scrap of data and code needed to reproduce the results in these chapters is shared in our book Github repository (including re-runs of all steps as R Markdown worksheets).We could have written a book that was only these chapters expanded, but we felt the core material was so under-taught that spending a bit more time on that would be higher value to the reader.

And that is my rough outline of *Practical Data Science with R*.

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Win-Vector Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...