Kernel Density Estimates & Truncated Normals

June 24, 2011
By

(This article was first published on mickeymousemodels, and kindly contributed to R-bloggers)

Earlier today I read a post about truncated normals, and one plot in particular jumped out at me:


By definition, the truncated normal should have zero density everywhere to the left of the truncation point, but that's not what we see in the plot. What's going on? The catch is that this isn't a plot of the true density, but rather a plot of the kernel density estimate returned by R's density function. The KDE is essentially a mixture of a bunch of normal densities, one for each data point. Wikipedia provides a nice illustration: in the rightmost frame, the red densities are (uniformly) mixed to create the blue KDE.

What happens when this estimate is applied to draws from a truncated normal? Every point is replaced with its own little normal density, and the points / densities at the very edge of the truncation cutoff inevitably spill over into the zero-density region. The KDE returns a smooth function, whereas truncated normals actually have discontinuous densities. Here's some R code to illustrate that point:

dnorm.truncated <- function(x, mean=0, sd=1, left=-1.5, right=0.50) {
# Probability that left <= x <= right
probability <- pnorm(right, mean=mean, sd=sd) -
pnorm(left, mean=mean, sd=sd)
return((x >= left & x <= right) *
dnorm(x, mean=mean, sd=sd) / probability)
}

# Sanity check: does the density integrate to one?
integrate(dnorm.truncated, -Inf, Inf)
integrate(dnorm.truncated, -1.5, 0.50)

dev.new(height=6, width=8)
x.coords <- seq(-3, 3, by=0.005)
plot(x.coords, dnorm(x.coords), ylim=c(0, 0.65),
col="red", type="l", lwd=2,
xlab="x", ylab="density", main="Two Density Functions")
lines(x.coords, dnorm.truncated(x.coords),
col="darkslateblue", lwd=2)
legend("topleft", "Truncated Normal", bty="n",
col="darkslateblue", lty=1, lwd=1.5)
legend("topright", "Standard Normal", bty="n",
col="red", lty=1, lwd=1.5)
savePlot("standard_and_truncated_normal_densities.png")

# Compare kernel density estimate to true truncated density
num.obs <- 1000
x <- rnorm(num.obs)
x <- x[x >= -1.5 & x <= 0.50]
plot(x.coords, dnorm.truncated(x.coords), ylim=c(0, 0.70),
col="darkslateblue", type="l", lwd=2,
xlab="x", ylab="density", main="Two Density Functions")
lines(density(x), col="darkgray", lty=3, lwd=2)
mtext(sprintf("Kernel density estimated from %s observations",
num.obs))
legend("topleft", "Truncated Normal", bty="n",
col="darkslateblue", lty=1, lwd=1.5)
legend("topright", "Kernel Density Estimate", bty="n",
col="darkgray", lty=3, lwd=1.5)
savePlot("truncated_normal_and_kernel_density_estimate.png")

The first plot shows two densities: a standard normal, and a truncated standard normal where we drop any observation that falls outside of [-1.50, 0.50].


The second plot shows the same truncated normal, this time accompanied by its kernel density estimate. Note that the KDE spills over both the left and right truncation points.


That's all for now. If you'd like to read a little more about kernel density estimates, check out this post by Andrew Gelman.

To leave a comment for the author, please follow the link and comment on his blog: mickeymousemodels.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.