# Kernel Density Estimates & Truncated Normals

June 24, 2011
By

(This article was first published on mickeymousemodels, and kindly contributed to R-bloggers)

Earlier today I read a post about truncated normals, and one plot in particular jumped out at me:

By definition, the truncated normal should have zero density everywhere to the left of the truncation point, but that’s not what we see in the plot. What’s going on? The catch is that this isn’t a plot of the true density, but rather a plot of the kernel density estimate returned by R’s density function. The KDE is essentially a mixture of a bunch of normal densities, one for each data point. Wikipedia provides a nice illustration: in the rightmost frame, the red densities are (uniformly) mixed to create the blue KDE.

What happens when this estimate is applied to draws from a truncated normal? Every point is replaced with its own little normal density, and the points / densities at the very edge of the truncation cutoff inevitably spill over into the zero-density region. The KDE returns a smooth function, whereas truncated normals actually have discontinuous densities. Here’s some R code to illustrate that point:

`dnorm.truncated <- function(x, mean=0, sd=1, left=-1.5, right=0.50) {  # Probability that left <= x <= right  probability <- pnorm(right, mean=mean, sd=sd) -                 pnorm(left, mean=mean, sd=sd)  return((x >= left & x <= right) *         dnorm(x, mean=mean, sd=sd) / probability)}# Sanity check: does the density integrate to one?integrate(dnorm.truncated, -Inf, Inf)integrate(dnorm.truncated, -1.5, 0.50)dev.new(height=6, width=8)x.coords <- seq(-3, 3, by=0.005)plot(x.coords, dnorm(x.coords), ylim=c(0, 0.65),     col="red", type="l", lwd=2,     xlab="x", ylab="density", main="Two Density Functions")lines(x.coords, dnorm.truncated(x.coords),      col="darkslateblue", lwd=2)legend("topleft", "Truncated Normal", bty="n",       col="darkslateblue", lty=1, lwd=1.5)legend("topright", "Standard Normal", bty="n",       col="red", lty=1, lwd=1.5)savePlot("standard_and_truncated_normal_densities.png")# Compare kernel density estimate to true truncated densitynum.obs <- 1000x <- rnorm(num.obs)x <- x[x >= -1.5 & x <= 0.50]plot(x.coords, dnorm.truncated(x.coords), ylim=c(0, 0.70),     col="darkslateblue", type="l", lwd=2,     xlab="x", ylab="density", main="Two Density Functions")lines(density(x), col="darkgray", lty=3, lwd=2)mtext(sprintf("Kernel density estimated from %s observations",              num.obs))legend("topleft", "Truncated Normal", bty="n",       col="darkslateblue", lty=1, lwd=1.5)legend("topright", "Kernel Density Estimate", bty="n",       col="darkgray", lty=3, lwd=1.5)savePlot("truncated_normal_and_kernel_density_estimate.png")`

The first plot shows two densities: a standard normal, and a truncated standard normal where we drop any observation that falls outside of [-1.50, 0.50].

The second plot shows the same truncated normal, this time accompanied by its kernel density estimate. Note that the KDE spills over both the left and right truncation points.

That’s all for now. If you’d like to read a little more about kernel density estimates, check out this post by Andrew Gelman.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...