Plotting conditional densities

April 14, 2012
By

(This article was first published on R snippets, and kindly contributed to R-bloggers)

Recently I have read a post on Comparing all quantiles of two distributions simultaneously on R-bloggers. In the post author plots two conditional density plots on one graph. I often use such a plot to visualize conditional densities of scores in binary prediction. After several times I had a problem with appropriate scaling of the plot to make both densities always fit into the plotting region I have written a small snippet that handles it.

Here is the code of the function. It scales both x and y axes appropriately:

# class: binary explained variable
# score: score obtained from prediction model
# main, xlab, col, lty, lwd: passed to plot function
# lx, ly: passed to legend function as x and y
cdp <- function(class, score,
                main = "Conditional density", xlab = "score",
                col = c(24), lty = c(11), lwd = c(11),
                lx = "topleft", ly NULL) {
    class <- factor(class)
    if (length(levels(class)) != 2) {
        stop("class must have two levels")
    }
    if (!is.numeric(score)) {
        stop("score must be numeric")
    }
    cscore <- split(score, class)
    cdensity <- lapply(cscore, density)
    xlim <- range(cdensity[[1]]$x, cdensity[[2]]$x)
    ylim <- range(cdensity[[1]]$y, cdensity[[2]]$y)
    plot(cdensity[[1]], main = main, xlab = xlab, col = col[1],
         lty = lty[1], lwd = lwd[1], xlim = xlim, ylim = ylim)
    lines(cdensity[[2]], col = col[2], lty = lty[2], lwd = lwd[2])
    legend(lx, ly, names(cdensity),
           lty = lty, col = col, lwd = lwd)
}

As an example of its application I compare its results to standard cdplot on a simple classification problem:

data(Participation, package = "Ecdat")
data.set <- Participation
data.set$age2 <- data.set$age 2
glm.model <- glm(lfp ., data = data.set, family=binomial(link probit))
par(mfrow = c(1, 2))
cdp(data.set$lfp, predict(glm.model), main = "cdp")
cdplot(factor(data.set$lfp) ~ predict(glm.model),
       main = "cdplot", xlab = "score", ylab = "lfp")

Here is the resulting plot:


To leave a comment for the author, please follow the link and comment on his blog: R snippets.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.