Big News: vtreat 1.2.0 is Available on CRAN, and it is now Big Data Capable
[This article was first published on R – Win-Vector Blog
, and kindly contributed to R-bloggers
]. (You can report issue about the content on this page here
Want to share your content on R-bloggers? click here
if you have a blog, or here
if you don't.
We here at Win-Vector LLC have some really big news we would please like the
R-community’s help sharing.
vtreat version 1.2.0 is now available on CRAN, and this version of
vtreat can now implement its data cleaning and preparation steps on databases and big data systems such as
vtreat is a very complete and rigorous tool for preparing messy real world data for supervised machine-learning tasks. It implements a technique we call “safe y-aware processing” using cross-validation or stacking techniques. It is very easy to use: you show it some data and it designs a data transform for you.
Thanks to the
rquery package, this data preparation transform can now be directly applied to databases, or big data systems such as
Apache Spark, or
Google BigQuery. Or, thanks to the
rqdatatable packages, even fast large in-memory transforms are possible.
We have some basic examples of the new
vtreat capabilities here and here.
If you got this far, why not subscribe for updates
from the site? Choose your flavor: e-mail
, or facebook