Latest vtreat up on CRAN

January 24, 2018

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

There is a new version of the R package vtreat now up on CRAN.

vtreat is an essential data preparation system for predictive modeling that helps defend your predictive modeling work against real world data issues including:

  • High cardinality categorical variables
  • Rare levels (including new or novel levels during application) in categorical variables
  • Missing data (random or systematic)
  • Irrelevant variables/columns
  • Nested model bias, and other over-fit issues.

vtreat also includes excellent, citable, documentation: vtreat: a data.frame Processor for Predictive Modeling.

For this release I want to thank everybody who generously donated their time to submit an issue or build a git pull-request. In particular:

  • Vadim Khotilovich, who found and fixed a major performance problem in the y-stratified sampling.
  • Lawrence Wu, who has been donating documentation fixes.
  • Peter Hurford, who has been donating documentation fixes.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)