XGBoost support added to Rattle

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Fang Zhou, Data Scientist; and Graham Williams, Director of Data Science, all at Microsoft

Rattle — the R Analytical Tool To Learn Easily — is a popular open-source GUI for data mining using R. It presents statistical and visual summaries of data, transforms data that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets. All of the underlying R code is presented as a script for learning R and for running independent of Rattle.

Collaborating with IGD Data Insight Team and under the guidance of the author of Rattle, Graham Williams, we took the challenging task to understand the existing code base and re-engineer it to support the latest machine learning algorithms in Open Source R and Microsoft R Server for model development and evaluation.

Extreme Gradient Boosting algorithm from the R package xgboost, is one of the newly added features, to provide alternative option for implementing boosting model.  The main effort in integrating xgboost into Rattle lies in three aspects:

  1. Define generic functions to provide a formula interface to streamline the process of fitting extreme gradient boosting model.
  2. Define the main R script to build, display and evaluate the model.
  3. Update the Rattle GUI to support the choice of xgboost using Glade Interface Designer and interactive R commands.

Now we demonstrate the usage of Rattle for xgboost on the credit card data set from Kaggle Competition- Credit Card Fraud Detection.

Rattle1

After loading the credit card data in CSV file from Rattle’s Data Tab, we can click on Model Tab to navigate to Boosting Model. By choosing the Model Builder xgb and a set of hyper-parameters, we can easily build a xgboost model without coding.

Rattle2

The measure and visualization of feature importance as well as training error can be generated by clicking the Importance and Errors buttons.

Rattle4

Performance evaluation is also supported. By navigating to the Evaluate Tab, we can calculate the confusion matrix and draw various statistical plots for model evaluation, such as ROC curve, Risk chart and Lift chart.

Rattle3

Do check the Log Tab to review the commands that were executed underneath.

Inspired by the work of IGD Data Insight Team (see this blog Microsoft R Server support for Rattle) and the latest release of LightGBM, mxnet, MicrosoftML etc, we could extend Rattle to expose plenty of functionality in the near future.

The latest release of Rattle (Version 5.0.18) is available on Bitbucket.

You can try this new version out using either Microsoft R Client on Windows or fire up an Azure Linux Data Science Virtual Machine which comes with the developer version of Microsoft R Server installed. Then upgrade the pre-installed Rattle to this new release.

togaware: Rattle: A Graphical User Interface for Data Mining using R

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)