Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface).
This means the user can express easily express modeling intent by choosing between coder$fit_transform(train_data), coder$fit(train_data_cal)$transform(train_data_model), and coder$fit(application_data).
We have also regenerated the current task-oriented vtreat documentation to demonstrate the new nested bias warning feature:
- Regression: Rregression example,Pythonregression example.
- Classification: Rclassification example,Pythonclassification example.
- Unsupervised data preparation: Runsupervised example,Pythonunsupervised example.
- Multinomial classification: Rmultinomial classification example,Pythonmultinomial classification example.
And we now have new versions of these documents showing the sklearn $fit_transform() style notation in R.
- Regression: R$fit_transform()regression example.
- Classification: R$fit_transform()classification example.
- Unsupervised data preparation: R$fit_transform()unsupervised example.
- Multinomial classification: R$fit_transform()multinomial classification example.
The original R interface is going to remain the standard interface for vtreat. It is more idiomatic R, and is taught in chapter 8 of Zumel, Mount; Practical Data Science with R, 2nd Edition, Manning 2019.
In contrast, the $fit_transform() notation will always just be an adaptor over the primary R interface.  However, there is a lot to be learned from sklearn’s organization and ideas, so we felt we would use make their naming convention available as a way of showing appreciation and giving credit.  Some more of my notes on the grace of the sklearn interface in being a good way to manage state and generative effects (see Brendan Fong, David I. Spivak; An Invitation to Applied Category Theory, Cambridge University Press, 2019) can be found here.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
