Predicting optimal of iterations and completion time for GBM

November 20, 2013

(This article was first published on Heuristic Andrew » r-project, and kindly contributed to R-bloggers)

When choosing the hyperparameters for Generalized Boosted Regression Models, two important choices are shrinkage and the number of trees. Generally a smaller shrinkage with more trees produces a better model, but the modeling time significantly increases. Building a model with too many trees that are heavily cut back by cross validation wastes time, while building a model with too few trees may require starting over with a larger number of trees—also a waste of time. So here I present a simple way to estimate the optimal number of trees and the modeling time for GBM as implemented in the R package gbm. Continue reading

