Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Derek Jones recently discussed a possible future for the R ecosystem in “StatsModels: the first nail in R’s coffin”.

This got me thinking on the future of CRAN (which I consider vital to R, and vital in distributing our work) in the era of super-popular meta-packages. Meta-packages are convenient, but they have a profoundly negative impact on the packages they exclude.

For example: tidyverse advertises a popular R universe where the vital package data.table never existed.

And now tidymodels is shaping up to be a popular universe where our own package vtreat never existed, except possibly as a footnote to embed.

Users currently (with some luck) discover packages like ours and then (because they trust CRAN) feel able to try them. With popular walled gardens that becomes much less likely. It is one thing for a standard package to duplicate another package (it is actually hard to avoid, and how work legitimately competes), it is quite another for a big-brand meta-package to pre-pick winners (and losers).

All I can say is: please give vtreat a chance and a try. It is a package for preparing messy real-world data for predictive modeling. In addition to re-coding high cardinality categorical variables (into what we call effect-codes after Cohen, or impact-codes), it deals with missing values, can be parallelized, can be run on databases, and has years of production experience baked in.

Some places to start with vtreat: