The Double Density Plot Contains a Lot of Useful Information

Posted on October 27, 2020 by jmount in R bloggers | 0 Comments

[This article was first published on R – Win Vector LLC, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The double density plot contains a lot of useful information.

This is a plot that shows the distribution of a continuous model score, conditioned on the binary categorical outcome to be predicted. As with most density plots: the y-axis is an abstract quantity called density picked such that the area of each curve integrates to 1.

An example is given here.

The really cool observation I wanted to share is: if we know this classifier is well calibrated, then we can recover the positive category prevalence from the graph.

A well calibrated probability score is one such that E[outcome == TRUE] = E[prediction]. For such a classifier we must have for the unknown positive outcome prevalence p. This is because the following relation holds in this case:

   p E[prediction | on positive curve] + (1 - p) E[prediction | on negative curve] = p

This follows as p and 1-p are the relative sizes of the positive and negative classes, prior to being re-scaled to integrate to one as part of the density. The conditional expectations E[prediction | on positive curve] and E[prediction | on negative curve] are depicted on the double density plot, so from them we can recover the prevalence p.

The recovery of the prevalence from the two conditional means is shown in the earlier figure.

We have some additional results coming out for what I am currently calling “fully calibrated probability scores.” These are scores such that E[outcome == TRUE | prediction = p] = p for all p in the interval [0, 1]. This includes a very interesting special case where it is easy to show that the prevalence is the probability value where the density curves cross.

To leave a comment for the author, please follow the link and comment on their blog: R – Win Vector LLC.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

The Double Density Plot Contains a Lot of Useful Information

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)