# eXtremely Boost your machine learning Exercises (Part-1)

**R-exercises**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

eXtreme Gradient Boosting is a machine learning model which became really popular few years ago after winning several Kaggle competitions. It is very powerful algorithm that use an ensemble of weak learners to obtain a strong learner. Its R implementation is available in `xgboost`

package and it is really worth including into anyone’s machine learning portfolio.

This is the first part of eXtremely Boost your machine learning series. For other parts follow the tag xgboost.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

**Exercise 1**

Load `xgboost`

library and download German Credit dataset. Your goal in this tutorial will be to predict `Creditability`

(the first column in the dataset).

**Exercise 2**

Convert columns `c(2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20)`

to factors and then encode them as dummy variables. HINT: use `model.matrix()`

**Exercise 3**

Split data into training and test set 700:300. Create `xgb.DMatrix`

for both sets with `Creditability`

as label.

**Exercise 4**

Train `xgboost`

with logistic objective and 30 rounds of training and maximal depth 2.

**Exercise 5**

To check model performance calculate test set classification error.

**Exercise 6**

Plot predictors importance.

**Exercise 7**

Use `xgb.train()`

instead of `xgboost()`

to add both train and test sets as a watchlist. Train model with same parameters, but 100 rounds to see how it performs during training.

**Exercise 8**

Train model again adding AUC and Log Loss as evaluation metrices.

**Exercise 9**

Plot how AUC and Log Loss for train and test sets was changing during training process. Use plotting function/library of your choice.

**Exercise 10**

Check how setting parameter `eta`

to 0.01 influences the AUC and Log Loss curves.

**leave a comment**for the author, please follow the link and comment on their blog:

**R-exercises**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.