# Probabilistic interpretation of AUC

**Alexej's blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Unfortunately this was not taught in any of my statistics or data analysis classes at university (wtf it so needs to be :scream_cat:).

So it took me some time until I learned that the AUC has a nice probabilistic meaning.

## What’s AUC anyway?

AUC is the **a**rea **u**nder the ROC **c**urve. The ROC curve is the **r**eceiver **o**perating **c**haracteristic curve. AUC is simply the area between that curve and the x-axis. So, to understand AUC we need to look at the concept of an ROC curve.

Consider:

- A dataset : , where
- is a vector of features collected for the th subject,
- is the th subject’s label (binary outcome variable of interest, like a disease status, class membership, or whatever binary label).

- A classification algorithm (such as logistic regression, SVM, deep neural net, or whatever you like), trained on , that assigns a score (or probability) to any new observation signifying how likely its label is .

Then:

- A
*decision threshold*(or*operating point*) can be chosen to assign a class label ( or ) to based on the value of .

The chosen threshold determines the balance between how many*false positives*and*false negatives*will result from this classification. - Plotting the
*true positive rate*(TPR) against the*false positive rate*(FPR)*as the operating point changes from its minimum to its maximum value*yields the*receiver operating characteristic (ROC) curve*. Check the confusion matrix if you are not sure what TPR and FPR refer to. - The area under the ROC curve, or AUC, is used as a measure of classifier performance.

Here is some R code for clarification (not even using `tidyverse`

:stuck_out_tongue:):

```
<span class="c1"># load some data, fit a logistic regression classifier</span><span class="w">
</span><span class="n">data</span><span class="p">(</span><span class="n">iris</span><span class="p">)</span><span class="w">
</span><span class="n">versicolor_virginica</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">iris</span><span class="p">[</span><span class="n">iris</span><span class="o">$</span><span class="n">Species</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s2">"setosa"</span><span class="p">,</span><span class="w"> </span><span class="p">]</span><span class="w">
</span><span class="n">logistic_reg_fit</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">glm</span><span class="p">(</span><span class="n">Species</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">Sepal.Width</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Sepal.Length</span><span class="p">,</span><span class="w">
</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">versicolor_virginica</span><span class="p">,</span><span class="w">
</span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"binomial"</span><span class="p">)</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ifelse</span><span class="p">(</span><span class="n">versicolor_virginica</span><span class="o">$</span><span class="n">Species</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"versicolor"</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">y_pred</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">logistic_reg_fit</span><span class="o">$</span><span class="n">fitted.values</span><span class="w">
</span><span class="c1"># get TPR and FPR at different values of the decision threshold</span><span class="w">
</span><span class="n">threshold</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">)</span><span class="w">
</span><span class="n">FPR</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sapply</span><span class="p">(</span><span class="n">threshold</span><span class="p">,</span><span class="w">
</span><span class="k">function</span><span class="p">(</span><span class="n">thresh</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nf">sum</span><span class="p">(</span><span class="n">y_pred</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">thresh</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="p">})</span><span class="w">
</span><span class="n">TPR</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sapply</span><span class="p">(</span><span class="n">threshold</span><span class="p">,</span><span class="w">
</span><span class="k">function</span><span class="p">(</span><span class="n">thresh</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nf">sum</span><span class="p">(</span><span class="n">y_pred</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">thresh</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="p">})</span><span class="w">
</span><span class="c1"># plot an ROC curve</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">FPR</span><span class="p">,</span><span class="w"> </span><span class="n">TPR</span><span class="p">)</span><span class="w">
</span><span class="n">lines</span><span class="p">(</span><span class="n">FPR</span><span class="p">,</span><span class="w"> </span><span class="n">TPR</span><span class="p">)</span><span class="w">
</span>
```

A rather ugly ROC curve emerges:

The area under the ROC curve, or AUC, seems like a nice heuristic to evaluate and compare the overall performance of classification models independent of the exact decision threshold chosen. signifies perfect classification accuracy, and is the accuracy of making classification decisions via coin toss (or rather a continuous coin that outputs values in …).

Most classification algorithms will result in an AUC in that range.

But there’s more to it.

## Probabilistic interpretation

As above, assume that we are looking at a dataset where we want to distinguish data points of *type 0* from those of *type 1*. Consider a classification algorithm that assigns to a random observation a score (or probability) signifying membership in *class 1*. If the final classification between *class 1* and *class 0* is determined by a decision threshold , then the *true positive rate* (a.k.a. *sensitivity* or *recall*) can be written as a conditional probability