by Joseph Rickert
In a recent post, where I presented some R related highlights of November's H20 World conference, I singled out and described talks by Trevor Hastie and John Chambers and remarked that it would be nice if the videos would be made available. Well, thanks to the generosity of the folks at H2O I got my wish.
Here is the video of Professor Hastie's talk.
This video represents a master class on machine learning where in 40 minutes or so Professor Hastie conducts a tour that starts with basic decision trees and goes all the way to building learning ensembles with the Lasso. Along the way, he presents the salient ideas on bagging, random forests and boosting. The treatment of boosting is succinct and elegant covering some remarkable features of the family of boosting algorithms. For example, Professor Hastie describes how training error in Adaboost can reach zero and stay there but testing error can continue to improve, how superior performance can be achieved with boosting algorithms by using only tree stumps, and how the stagewise additive modeling “slows down the rate of overfitting”. The really deep insight comes in the discussion about viewing Adaboost as an algorithm that fits additive logistic regression models with an exponential loss function. This, in turn, leads to a discussion Jerome Freidman's Gradient Boosting Machine and more general boosting algorithms that can accommodate multiple kinds of loss functions. These are the models implemented in R's gbm package.
I think this video of John Chambers' reminiscing about his time at Bell Labs working with John Tukey is destined to become an important part of the historical record for Statistics. There are many remembrences of Tukey to be found online, but I don't know of any other visual record by someone of John Chambers' stature who interacted with Tukey as a colleague and professional statistician.
In Just a few minutes, Chambers paints a balanced and revealing portrait that humanizes and captures some of the complexity of this icon of modern statistics. I especially like the story in the Q & A portion of the talk where John describes Tukey's propensity for “mischief” and his delight in inventing new words (like boxplot “hinges”) that rankled many of his statistician colleagues, but apparently particularly upset the British statisticians.
There are a few more videos on the H2O site that are worth a look.