New package for ensembling R models

March 13, 2013
By

(This article was first published on Modern Toolmaking, and kindly contributed to R-bloggers)


I've written a new R package called caretEnsemble for creating ensembles of caret models in R.  It currently works well for regression models, and I've written some preliminary support for binary classification models.


At this point, I've got 2 different algorithms for combining models:

1. Greedy stepwise ensembles (returns a weight for each model)
2. Stacks of caret models

(You can also manually specify weights for a greedy ensemble)

The greedy algorithm is based on the work of Caruana et al., 2004, and inspired by the medley package here on github.  The stacking algorithm simply builds a second caret model on top of the existing models (using their predictions as input), and employs all of the flexibility of the caret package.

All the models in the ensemble must use the same training/test folds.  Both algorithms use the out-of-sample predictions to find the weights and train the stack. Here's a brief script demonstrating how to use the package:


Please feel free to submit any comments here or on github.  I'd also be happy to include any patches you feel like submitting.  In particular, I could use some help writing support for multi-class models, writing more tests, and fixing bugs.

To leave a comment for the author, please follow the link and comment on his blog: Modern Toolmaking.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.