Part 1 consisted of building a classification tree with the “party” package. I will now use “ipred” to examine the same data with a bagging (bootstrap aggregation) algorithm.

> library(ipred)

> train_bag = bagging(class ~ ., data=train, coob=T)

> train_bag

Bagging classification trees with 25 bootstrap replications

Call: bagging.data.frame(formula = class ~ ., data = train, coob = T)

Out-of-bag estimate of misclassification error: 0.0424

> table(predict(train_bag), train$class)

benign malignant

benign 290 9

malignant 11 162

> testbag = predict(train_bag, newdata=test)

> table(testbag, test$class)

testbag benign malignant

benign 137 1

malignant 6 67

If you compare the confusion matrices from this week to the prior post, what do you think?

Let’s recall the prior ROC curve and combine it with the bagged model.

#prepare bagged model for curve

> test.bagprob = predict(train_bag, type = “prob”, newdata = test)

> bagpred = prediction(test.bagprob[,2], test$class)

> bagperf = performance(bagpred, “tpr”, “fpr”)

> plot(perf, main=”ROC”, colorize=T)

> plot(bagperf, col=2, add=TRUE)

> plot(perf, col=1, add=TRUE)

> legend(0.6, 0.6, c(‘ctree’, ‘bagging’), 1:2)

As we could see from glancing at the confusion matrices, the bagged model outperforms the standard tree model. Finally, let’s have a look at the AUC (.992 with bagging versus .985 last time around)

> auc.curve = performance(bagpred, “auc”)

> auc.curve

An object of class “performance”

Slot “x.name”:

[1] “None”

Slot “y.name”:

[1] “Area under the ROC curve”

Slot “alpha.name”:

[1] “none”

Slot “x.values”:

list()

Slot “y.values”:

[[1]]

[**1] 0.9918244**

Slot “alpha.values”:

list()

