eXtremely Boost your machine learning Exercises (Part-1)

September 24, 2017

(This article was first published on R-exercises, and kindly contributed to R-bloggers)

eXtreme Gradient Boosting is a machine learning model which became really popular few years ago after winning several Kaggle competitions. It is very powerful algorithm that use an ensemble of weak learners to obtain a strong learner. Its R implementation is available in xgboost package and it is really worth including into anyone’s machine learning portfolio.

This is the first part of eXtremely Boost your machine learning series. For other parts follow the tag xgboost.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

Exercise 1
Load xgboost library and download German Credit dataset. Your goal in this tutorial will be to predict Creditability (the first column in the dataset).

Exercise 2
Convert columns c(2,4,5,7,8,9,10,11,12,13,15,16,17,18,19,20) to factors and then encode them as dummy variables. HINT: use model.matrix()

Exercise 3
Split data into training and test set 700:300. Create xgb.DMatrix for both sets with Creditability as label.

Exercise 4
Train xgboost with logistic objective and 30 rounds of training and maximal depth 2.

Exercise 5
To check model performance calculate test set classification error.

Exercise 6
Plot predictors importance.

Learn more about machine learning in the online course Beginner to Advanced Guide on Machine Learning with R Tool. In this course you will learn how to:

  • Create a machine learning algorithm from a beginner point of view
  • Quickly dive into more advanced methods in an accessible pace and with more explanations
  • And much more

This course shows a complete workflow start to finish. It is a great introduction and fallback when you have some experience.

Exercise 7
Use xgb.train() instead of xgboost() to add both train and test sets as a watchlist. Train model with same parameters, but 100 rounds to see how it performs during training.

Exercise 8
Train model again adding AUC and Log Loss as evaluation metrices.

Exercise 9
Plot how AUC and Log Loss for train and test sets was changing during training process. Use plotting function/library of your choice.

Exercise 10
Check how setting parameter eta to 0.01 influences the AUC and Log Loss curves.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)