More on preparing data

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The Microsoft Data Science User Group just sponsored Nina Zumel‘s presentation “Preparing Data for Analysis Using R”. Microsoft saw Win-Vector LLC‘s ODSC West 2015 presentation “Prepping Data for Analysis using R” and generously offered to sponsor improving it and disseminating it to a wider audience.


We feel Nina really hit the ball out of the park with over 400 new live viewers. Read more for links to even more free materials!

Microsoft has generously sponsored the following:

These are really great materials and we will be promoting and distributing them widely.

Nina emphasized teaching the principles of data treatment and cleaning (frankly an under-emphasized task). She also mentioned a free R library supplied by Win-Vector LLC: vtreat that automates a great number of the steps in a principled and statistically sound manner. Because her lecture is likely to attract more interest in the vtreat library: we have tuned up the vtreat documentation a bit and made it available as pre-rendered HTML (in addition to the normal vignette distribution). Of particular interest we have finally enumerated all the variable types that vtreat uses to re-encode your data.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)