eXtreme Gradient Boosting is a machine learning model which became really popular few years ago after winning several Kaggle competitions. It is very powerful algorithm that use an ensemble of weak learners to obtain a strong learner. Its R implementation is available in
xgboost package and it is really worth including into anyone’s machine learning portfolio.
This is the first part of eXtremely Boost your machine learning series. For other parts follow the tag xgboost.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
xgboost library and download German Credit dataset. Your goal in this tutorial will be to predict
Creditability (the first column in the dataset).
c(2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20) to factors and then encode them as dummy variables. HINT: use
Split data into training and test set 700:300. Create
xgb.DMatrix for both sets with
Creditability as label.
xgboost with logistic objective and 30 rounds of training and maximal depth 2.
To check model performance calculate test set classification error.
Plot predictors importance.
xgb.train() instead of
xgboost() to add both train and test sets as a watchlist. Train model with same parameters, but 100 rounds to see how it performs during training.
Train model again adding AUC and Log Loss as evaluation metrices.
Plot how AUC and Log Loss for train and test sets was changing during training process. Use plotting function/library of your choice.
Check how setting parameter
eta to 0.01 influences the AUC and Log Loss curves.