Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

ROC for Decision Trees – where did the data come from?
By Jerry Tuttle
In doing decision tree classification problems, I have often graphed the ROC (Receiver Operating Characteristic) curve. The True Positive Rate (TPR) is on the y-axis, and the False Positive Rate (FPR) is on the x-axis. True Positive is when the lab test predicts you have the disease and you actually do have it. False Positive is when the lab test predicts you have the disease but you actually do not have it.
The following code uses the sample dataset kyphosis from the rpart package, creates a default decision tree, prints the confusion matrix, and plots the ROC curve. (Kyphosis is a type of spinal deformity.)

library(rpart)
df <- kyphosis
set.seed(1)
mytree <- rpart(Kyphosis ~ Age + Number + Start, data = df, method="class")
library(rattle)
library(rpart.plot)
library(RColorBrewer)
fancyRpartPlot(mytree, uniform=TRUE, main=”Kyphosis Tree”)
predicted <- predict(mytree, type="class")
table(df\$Kyphosis,predicted)
library(ROCR)
pred <- prediction(predict(mytree, type="prob")[, 2], df\$Kyphosis)
plot(performance(pred, “tpr”, “fpr”), col=”blue”, main=”ROC Kyphosis, using library ROCR”)
abline(0, 1, lty=2)
auc <- performance(pred, "auc")
[email protected]

dat <- data.frame()
s <- predict(mytree, type="prob")[, 2]
for (i in 1:21){
p <- .05*(i-1)
thresh p, “present”, “absent”)
t <- table(df\$Kyphosis,thresh)
fpr <- ifelse(ncol(t)==1, 0, t[1,2] / (t[1,2] + t[1,1]))
tpr <- ifelse(ncol(t)==1, 0, t[2,2] / (t[2,2] + t[2,1]))
dat[i,1] <- fpr
dat[i,2] <- tpr
}
colnames(dat) <- c("fpr", "tpr")
plot(x=dat\$fpr, y=dat\$tpr, xlab=”FPR”, ylab=”TPR”, xlim=c(0,1),
ylim=c(0,1),
main=”ROC Kyphosis, using indiv threshold calcs”, type=”b”, col=”blue”)
abline(0, 1, lty=2)

ROC for Decision Trees – where did the data come from? was first posted on August 8, 2020 at 1:39 pm.