Stability of classification trees
[This article was first published on R snippets
, and kindly contributed to R-bloggers
]. (You can report issue about the content on this page here
Want to share your content on R-bloggers? click here
if you have a blog, or here
if you don't.
Classification trees are known to be unstable with respect to training data. Recently I have read an article on stability of classification trees by Briand et al. (2009). They propose a quantitative similarity measure between two trees. The method is interesting and it inspired me to prepare a simple test data based example showing instability of classification trees.
I compare stability of logistic regression and classification tree on Participation data set from Ecdat package. The method works as follows:
- Divide the data into training and test data set;
- Generate a random subset of training data and build logistic regression and classification tree using them;
- Apply the models on test data to obtain predicted probabilities;
- Repeat steps 2 and 3 many times;
- For each observation in test data set calculate standard deviation of obtained predictions for both classes of models;
- For both models plot kernel density estimator of standard deviation distribution in test data set.
The code performing the above steps is as follows:
shuffle <- Participation[sample(nrow(Participation)),]
train <- shuffle[301:nrow(Participation),]
p.tree <- p.log <- vector(“list”, reps)
train.sub <- train[sample(nrow(train))[1:300],]
mtree <- ctree(lfp ~ ., data = train.sub)
mlog <- glm(lfp ~ ., data = train.sub, family = binomial)
p.tree[[i]] <- sapply(treeresponse(mtree, newdata = test),
p.log[[i]] <- predict(mlog, newdata = test, type = “response”)
plot(density(apply(do.call(rbind, p.log), 2, sd)),
lines(density(apply(do.call(rbind, p.tree), 2, sd)), col=“red”)
legend(“topright”, legend = c(“logistic”, “tree”),
col = c(“black”,“red”), lty = 1)
And here is the generated comparison. As it can be clearly seen logistic regression gives much more stable predictions in comparison to classification tree.
If you got this far, why not subscribe for updates
from the site? Choose your flavor: e-mail
, or facebook