# The Double Density Plot Contains a Lot of Useful Information

**R – Win Vector LLC**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The double density plot contains a lot of useful information.

This is a plot that shows the distribution of a continuous model score, conditioned on the binary categorical outcome to be predicted. As with most density plots: the y-axis is an abstract quantity called density picked such that the area of each curve integrates to 1.

An example is given here.

The really cool observation I wanted to share is: if we know this classifier is well calibrated, then we can recover the positive category prevalence from the graph.

A well calibrated probability score is one such that `E[outcome == TRUE] = E[prediction]`

. For such a classifier we must have for the unknown positive outcome prevalence `p`

. This is because the following relation holds in this case:

p E[prediction | on positive curve] + (1 - p) E[prediction | on negative curve] = p

This follows as `p`

and `1-p`

are the relative sizes of the positive and negative classes, prior to being re-scaled to integrate to one as part of the density. The conditional expectations `E[prediction | on positive curve]`

and `E[prediction | on negative curve]`

are depicted on the double density plot, so from them we can recover the prevalence `p`

.

The recovery of the prevalence from the two conditional means is shown in the earlier figure.

We have some additional results coming out for what I am currently calling “fully calibrated probability scores.” These are scores such that `E[outcome == TRUE | prediction = p] = p`

for all `p`

in the interval `[0, 1]`

. This includes a very interesting special case where it is easy to show that the prevalence is the probability value where the density curves cross.

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Win Vector LLC**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.