# Plotting conditional densities

April 14, 2012
By

(This article was first published on R snippets, and kindly contributed to R-bloggers)

Recently I have read a post on Comparing all quantiles of two distributions simultaneously on R-bloggers. In the post author plots two conditional density plots on one graph. I often use such a plot to visualize conditional densities of scores in binary prediction. After several times I had a problem with appropriate scaling of the plot to make both densities always fit into the plotting region I have written a small snippet that handles it.

Here is the code of the function. It scales both x and y axes appropriately:

# class: binary explained variable
# score: score obtained from prediction model
# main, xlab, col, lty, lwd: passed to plot function
# lx, ly: passed to legend function as x and y
cdp <- function(class, score,
main = “Conditional density”, xlab = “score”,
col = c(24), lty = c(11), lwd = c(11),
lx = “topleft”, ly NULL) {
class <- factor(class)
if (length(levels(class)) != 2) {
stop(“class must have two levels”)
}
if (!is.numeric(score)) {
stop(“score must be numeric”)
}
cscore <- split(score, class)
cdensity <- lapply(cscore, density)
xlim <- range(cdensity[[1]]\$x, cdensity[[2]]\$x)
ylim <- range(cdensity[[1]]\$y, cdensity[[2]]\$y)
plot(cdensity[[1]], main = main, xlab = xlab, col = col[1],
lty = lty[1], lwd = lwd[1], xlim = xlim, ylim = ylim)
lines(cdensity[[2]], col = col[2], lty = lty[2], lwd = lwd[2])
legend(lx, ly, names(cdensity),
lty = lty, col = col, lwd = lwd)
}

As an example of its application I compare its results to standard cdplot on a simple classification problem:

data(Participation, package = “Ecdat”)
data.set <- Participation
data.set\$age2 <- data.set\$age 2
glm.model <- glm(lfp ., data = data.set, family=binomial(link probit))
par(mfrow = c(1, 2))
cdp(data.set\$lfp, predict(glm.model), main = “cdp”)
cdplot(factor(data.set\$lfp) ~ predict(glm.model),
main = “cdplot”, xlab = “score”, ylab = “lfp”)

Here is the resulting plot:

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...