Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Unfortunately this was not taught in any of my statistics or data analysis classes at university (wtf it so needs to be :scream_cat:). So it took me some time until I learned that the AUC has a nice probabilistic meaning.

## What’s AUC anyway?

AUC is the area under the ROC curve. The ROC curve is the receiver operating characteristic curve. AUC is simply the area between that curve and the x-axis. So, to understand AUC we need to look at the concept of an ROC curve.

Consider:

1. A dataset $S\inline$: $(\mathbf{x}_1, y_1), \ldots, (\mathbf{x}_n, y_n) \in \mathbb{R}^p \times \{0, 1\}\inline$, where
• $\mathbf{x}_i\inline$ is a vector of $p\inline$ features collected for the $i\inline$th subject,
• $y_i\inline$ is the $i\inline$th subject’s label (binary outcome variable of interest, like a disease status, class membership, or whatever binary label).
2. A classification algorithm (such as logistic regression, SVM, deep neural net, or whatever you like), trained on $S\inline$, that assigns a score (or probability) $\hat{p}(\mathbf{x}_{\ast})\inline$ to any new observation $\mathbf{x}_{\ast} \in \mathbb{R}^p\inline$ signifying how likely its label is $y_{\ast} = 1\inline$.

Then:

1. A decision threshold (or operating point) can be chosen to assign a class label ($y_{\ast} = 0\inline$ or $1\inline$) to $\mathbf{x}_{\ast}\inline$ based on the value of $\hat{p}(\mathbf{x}_{\ast})\inline$. The chosen threshold determines the balance between how many false positives and false negatives will result from this classification.
2. Plotting the true positive rate (TPR) against the false positive rate (FPR) as the operating point changes from its minimum to its maximum value yields the receiver operating characteristic (ROC) curve. Check the confusion matrix if you are not sure what TPR and FPR refer to.
3. The area under the ROC curve, or AUC, is used as a measure of classifier performance.

Here is some R code for clarification (not even using tidyverse :stuck_out_tongue:):

# load some data, fit a logistic regression classifier
data(iris)
versicolor_virginica <- iris[iris$Species != "setosa", ] logistic_reg_fit <- glm(Species ~ Sepal.Width + Sepal.Length, data = versicolor_virginica, family = "binomial") y <- ifelse(versicolor_virginica$Species == "versicolor", 0, 1)
y_pred <- logistic_reg_fit\$fitted.values

# get TPR and FPR at different values of the decision threshold
threshold <- seq(0, 1, length = 100)
FPR <- sapply(threshold,
function(thresh) {
sum(y_pred >= thresh & y != 1) / sum(y != 1)
})
TPR <- sapply(threshold,
function(thresh) {
sum(y_pred >= thresh & y == 1) / sum(y == 1)
})

# plot an ROC curve
plot(FPR, TPR)
lines(FPR, TPR)


A rather ugly ROC curve emerges:

The area under the ROC curve, or AUC, seems like a nice heuristic to evaluate and compare the overall performance of classification models independent of the exact decision threshold chosen. $\mathrm{AUC} = 1.0\inline$ signifies perfect classification accuracy, and $\mathrm{AUC} = 0.5\inline$ is the accuracy of making classification decisions via coin toss (or rather a continuous coin that outputs values in $[0,1]\inline$…). Most classification algorithms will result in an AUC in that range. But there’s more to it.

## Probabilistic interpretation

As above, assume that we are looking at a dataset where we want to distinguish data points of type 0 from those of type 1. Consider a classification algorithm that assigns to a random observation $\mathbf{x}\in\mathbb{R}^p\inline$ a score (or probability) $\hat{p}(\mathbf{x}) \in [0,1]\inline$ signifying membership in class 1. If the final classification between class 1 and class 0 is determined by a decision threshold $t\in[0, 1]\inline$, then the true positive rate (a.k.a. sensitivity or recall) can be written as a conditional probability

$T(t) := P[\hat{p}(\mathbf{x}) > t \,|\, \mathbf{x}\,\text{belongs to class 1}],$