The Power of Decision Stumps

January 1, 2016
By

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

A decision stump is the weak classification model with the simple tree structure consisting of one split, which can also be considered a one-level decision tree. Due to its simplicity, the stump often demonstrates a low predictive performance. As shown in the example below, the AUC measure of a stump is even lower than the one of a single attribute in a separate testing dataset.

pkgs <- c('pROC', 'RWeka')
lapply(pkgs, require, character.only = T)
df1 <- read.csv("credit_count.txt")
df2 <- df1[df1$CARDHLDR == 1, ]
set.seed(2016)
n <- nrow(df2)
sample <- sample(seq(n), size = n / 2, replace = FALSE)
train <- df2[sample, ]
test <- df2[-sample, ]
x <- paste("AGE + ACADMOS + ADEPCNT + MAJORDRG + MINORDRG + OWNRENT + INCOME + SELFEMPL + INCPER + EXP_INC")
fml <- as.formula(paste("as.factor(DEFAULT) ~ ", x))

### IDENTIFY THE MOST PREDICTIVE ATTRIBUTE ###
imp <- InfoGainAttributeEval(fml, data = train)
imp_x <- test[, names(imp[imp == max(imp)])]
roc(as.factor(test$DEFAULT), imp_x)
# Area under the curve: 0.6243

### CONSTRUCT A WEAK CLASSIFIER OF DECISION STUMP ###
stump <- DecisionStump(fml, data = train)
print(stump)
roc(as.factor(test$DEFAULT), predict(stump, newdata = test, type = "probability")[, 2])
# Area under the curve: 0.5953

Albeit weak by itself, the decision stump can be used as a base model in many machine learning ensemble methods, such as bagging and boosting. For instance, the bagging classifier with 1,000 stumps combined outperforms the single stump by ~7% in terms of AUC (0.6346 vs. 0.5953). Moreover, AdaBoost with stumps can further improve the predictive performance over the single stump by ~11% (0.6585 vs. 0.5953) and also over the logistic regression benchmark by ~2% (0.6585 vs. 0.6473).

### BUILD A BAGGING CLASSIFIER WITH 1,000 STUMPS IN PARALLEL ###
bagging <- Bagging(fml, data = train, control = Weka_control("num-slots" = 0, I = 1000, W = "DecisionStump", S = 2016, P = 50))
roc(as.factor(test$DEFAULT), predict(bagging, newdata = test, type = "probability")[, 2])
# Area under the curve: 0.6346

### BUILD A BOOSTING CLASSIFIER WITH STUMPS ###
boosting <- AdaBoostM1(fml, data = train, control = Weka_control(I = 100, W = "DecisionStump", S = 2016))
roc(as.factor(test$DEFAULT), predict(boosting, newdata = test, type = "probability")[, 2])
# Area under the curve: 0.6585
 
### DEVELOP A LOGIT MODEL FOR THE BENCHMARK ###
logit <- Logistic(fml, data = train)
roc(as.factor(test$DEFAULT), predict(logit, newdata = test, type = "probability")[, 2])
# Area under the curve: 0.6473

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)