Blog Archives

Practical Data Science with R, 2nd Edition: Introduction Video

November 28, 2019
By
Practical Data Science with R, 2nd Edition: Introduction Video

Nina and I have prepared a quick introduction video for Practical Data Science with R, 2nd Edition. We are really proud of both editions of the book. This book can help an R user directly experience the data science style of working with data and machine learning techniques. The book is available now at: Directly … Continue reading Practical...

Read more »

Practical Data Science with R, 2nd Edition, IS OUT!!!!!!!

November 15, 2019
By
Practical Data Science with R, 2nd Edition, IS OUT!!!!!!!

Practical Data Science with R, 2nd Edition author Dr. Nina Zumel, with a fresh author’s copy of her book!

Read more »

New Introduction to rquery

October 27, 2019
By

Introduction rquery is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of R’s base::transform(), or dplyr’s dplyr::mutate() and uses a pipe in the style popularized in R with magrittr. The operators themselves follow the selections in Codd’s relational algebra, with the … Continue reading New...

Read more »

Practical Data Science with R 2nd Edition update

October 17, 2019
By

We are in the last stages of proofing the galleys/typesetting of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019. So this edition will definitely be out soon! If you ever wanted to see what Nina Zumel and John Mount are like when we have the help of editors, this book is your … Continue reading Practical...

Read more »

Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

October 15, 2019
By
Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

We are excited to share a free extract of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019: Evaluating a Classification Model with a Spam Filter. This section reflects an important design decision in the book: teach model evaluation first, and as a step separate from model construction. It is funny, but it … Continue reading Free...

Read more »

vtreat Cross Validation

October 5, 2019
By

Nina Zumel finished new documentation on how vtreat‘s cross validation works, which I want to share here. vtreat is a system that makes data preparation for machine learning a “one-liner” (available in R or available in Python). We have a set of starting off points here. These documents describe what vtreat does for you, you … Continue reading vtreat...

Read more »

You Can Override Just About Anything in R

October 2, 2019
By

To understand computations in R, two slogans are helpful: Everything that exists is an object. Everything that happens is a function call. John Chambers In R, the “[” array access operator is a function call. And it is one a user can re-bind to the new effect of their own choosing. Let’s see what sort … Continue reading You...

Read more »

New vtreat Documentation (Starting with Multinomial Classification)

October 1, 2019
By

Nina Zumel finished some great new documentation showing how to use Python vtreat to prepare data for multinomial classification mode. And I have finally finished porting the documentation to R vtreat. So we now have good introductions on how to use vtreat to prepare data for the common tasks of: Regression: R regression example, Python … Continue reading New...

Read more »

How to Prepare Data

September 26, 2019
By

Real world data can present a number of challenges to data science workflows. Even properly structured data (each interesting measurement already landed in distinct columns), can present problems, such as missing values and high cardinality categorical variables. In this note we describe some great tools for working with such data. For an example: consider the … Continue reading How...

Read more »

Preparing Data for Supervised Classification

September 24, 2019
By

Nina Zumel has been polishing up new vtreat for Python documentation and tutorials. They are coming out so good that I find to be fair to the R community I must start to back-port this new documentation to vtreat for R. vtreat is a package for systematically preparing data for supervised machine learning tasks such … Continue reading Preparing...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)