# 浅谈ROC曲线

 This post was kindly contributed by 数据科学与R语言 - go there to comment and to read the full post.

`# 做一个logistic回归，生成概率预测值model1 <- glm(y~., data=newdata, family='binomial')pre <- predict(model1,type='response')# 将预测概率prob和实际结果y放在一个数据框中data <- data.frame(prob=pre,obs=newdata\$y)# 按预测概率从低到高排序data <- data[order(data\$prob),]n <- nrow(data)tpr <- fpr <- rep(0,n)# 根据不同的临界值threshold来计算TPR和FPR，之后绘制成图for (i in 1:n) {    threshold <- data\$prob[i]    tp <- sum(data\$prob > threshold & data\$obs == 1)    fp <- sum(data\$prob > threshold & data\$obs == 0)    tn <- sum(data\$prob < threshold & data\$obs == 0)    fn <- sum(data\$prob < threshold & data\$obs == 1)    tpr[i] <- tp/(tp+fn) # 真正率    fpr[i] <- fp/(tn+fp) # 假正率}plot(fpr,tpr,type='l')abline(a=0,b=1)`
`R中也有专门用来绘制ROC曲线的包，例如常见的ROCR包，它不仅可以用来画图，还能计算ROC曲线下面积AUC，以评价分类器的综合性能，该数值取0-1之间，越大越好。library(ROCR)pred <- prediction(pre,newdata\$y)performance(pred,'auc')@y.values #AUC值perf <- performance(pred,'tpr','fpr')plot(perf)`
`ROCR包画图函数功能比较单一，笔者比较偏好使用功能更强大的pROC包。它可以方便比较两个分类器，还能自动标注出最优的临界点，图看起来也比较漂亮。library(pROC)modelroc <- roc(newdata\$y,pre)plot(modelroc, print.auc=TRUE, auc.polygon=TRUE, grid=c(0.1, 0.2),     grid.col=c("green", "red"), max.auc.polygon=TRUE,     auc.polygon.col="skyblue", print.thres=TRUE)`