XGBoost is the flavour of the moment for serious competitors on kaggle. It was developed by Tianqi Chen and provides a particularly efficient implementation of the Gradient Boosting algorithm. Although there is a CLI implementation of XGBoost you’ll probably be more interested in using it from either R or Python. Below are instructions for getting it installed for each of these languages. It’s pretty painless.
Installing XGBoost in R
Installation in R is extremely simple.
> install.packages('xgboost') > library(xgboost)
XGBoost is also supported as a model in caret, which is especially handy for feature selection and model parameter tuning.
Installing XGBoost in Python
# cd xgboost-master # make # cd python-package/ # python setup.py install --user
And you’re ready to roll:
Enjoy building great models with the absurdly powerful tool. I’ve found that it effortlessly consumes vast data sets that grind other algorithms to a halt. Get started by looking at some code examples. Also worth looking at are
- an Introduction to Boosted Trees;
- a tutorial showing how XGBoost was applied to the Otto Group Product Classification Challenge;
- Understanding Gradient Boosting (Part 1); and
- a presentation by Alexander Ihler.