Announcing the caretEnsemble R package

[This article was first published on Data, Evidence, and Policy - Jared Knowles, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Last week version 1.0 of the caretEnsemble package was released to CRAN. I have co-authored this package with Zach Mayer, who had the original idea of allowing for ensembles of train objects in the caret package. The package is designed to make it easy for the user to optimally combine models of various types together to produce a meta-model with superior fit than the sub-models. 

From the vignette:

"caretEnsemble has 3 primary functions: caretList, caretEnsemble and caretStack. caretList is used to build lists of caret models on the same training data, with the same re-sampling parameters. caretEnsemble andcaretStack are used to create ensemble models from such lists of caret models. caretEnsemble uses greedy optimization to create a simple linear blend of models and caretStack uses a caret model to combine the outputs from several component caret models."

I am excited about this package because the ensembling features in caretEnsemble are used to provide additional predictive power in the Wisconsin Dropout Early Warning System (DEWS). I’ve written about this system before, but it is a large-scale machine learning system used to provide schools with a prediction on the likely graduation of their middle grade students. It is easy to implement and provides additional predictive power for the cost of some CPU cycles. 

Additionally, Zach and I have worked hard to make ensembling models *easy*. For example, you can automatically build lists of models — a library of models — for ensembling using the caretList function. This caretList can then be used directly in either the caretEnsemble or caretStack mode, depending on how you want to combine the predictions from the submodels. These new caret objects also come with their own S3 methods (adding more in future releases) to allow you to interact with them and explore the results of ensembling — including summary, print, plot, and variable importance calculations. They also include the all important predict method allowing you to generate predictions for use elsewhere. 

Zach has written a great vignette that should give you a feel for how caretEnsemble works. And, we are actively improving caretEnsemble over on GitHub. Drop by and let us know if you find a bug, have a goal

To leave a comment for the author, please follow the link and comment on their blog: Data, Evidence, and Policy - Jared Knowles.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)