# ROC for Decision Trees – where did the data come from?

**R-posts.com**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

ROC for Decision Trees – where did the data come from?

By Jerry Tuttle

In doing decision tree classification problems, I have often graphed the ROC (Receiver Operating Characteristic) curve. The True Positive Rate (TPR) is on the y-axis, and the False Positive Rate (FPR) is on the x-axis. True Positive is when the lab test predicts you have the disease and you actually do have it. False Positive is when the lab test predicts you have the disease but you actually do not have it.

The following code uses the sample dataset kyphosis from the rpart package, creates a default decision tree, prints the confusion matrix, and plots the ROC curve. (Kyphosis is a type of spinal deformity.)

library(rpart)

df <- kyphosis

set.seed(1)

mytree <- rpart(Kyphosis ~ Age + Number + Start, data = df, method="class")

library(rattle)

library(rpart.plot)

library(RColorBrewer)

fancyRpartPlot(mytree, uniform=TRUE, main=”Kyphosis Tree”)

predicted <- predict(mytree, type="class")

table(df$Kyphosis,predicted)

library(ROCR)

pred <- prediction(predict(mytree, type="prob")[, 2], df$Kyphosis)

plot(performance(pred, “tpr”, “fpr”), col=”blue”, main=”ROC Kyphosis, using library ROCR”)

abline(0, 1, lty=2)

auc <- performance(pred, "auc")

[email protected]

dat <- data.frame()

s <- predict(mytree, type="prob")[, 2]

for (i in 1:21){

p <- .05*(i-1)

thresh p, “present”, “absent”)

t <- table(df$Kyphosis,thresh)

fpr <- ifelse(ncol(t)==1, 0, t[1,2] / (t[1,2] + t[1,1]))

tpr <- ifelse(ncol(t)==1, 0, t[2,2] / (t[2,2] + t[2,1]))

dat[i,1] <- fpr

dat[i,2] <- tpr

}

colnames(dat) <- c("fpr", "tpr")

plot(x=dat$fpr, y=dat$tpr, xlab=”FPR”, ylab=”TPR”, xlim=c(0,1),

ylim=c(0,1),

main=”ROC Kyphosis, using indiv threshold calcs”, type=”b”, col=”blue”)

abline(0, 1, lty=2)

ROC for Decision Trees – where did the data come from? was first posted on August 8, 2020 at 1:39 pm.

©2020 “R-posts.com“. Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at tal.galili@gmail.com

**leave a comment**for the author, please follow the link and comment on their blog:

**R-posts.com**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.