Site icon R-bloggers

The ‘kde1d’ package

[This article was first published on Saturn Elephant, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It seems to me that the kde1d package (One-Dimensional Kernel Density Estimation) is not very known. I’ve never heard of it on Stack Overflow, except in an answer of mine.

However this is a great package, IMHO. I’m going to show why I like it.

The d/p/q/r family

Estimating a density with the kde1d function returns a kde1d object, and this makes available the density, the distribution function, the quantile function associated to the density estimate, as well as a sampler from the estimated distribution.

Let’s fit a density with kde1d to a simulated Gaussian sample:

library(kde1d)
set.seed(666)
y <- rnorm(100)
fit <- kde1d(y)

Here is the density estimate, in green, along with the true density, in blue:

opar <- par(mar = c(3, 1, 1, 1))
plot(NULL, xlim = c(-3.5, 3.5), ylim = c(0, 0.4), axes = FALSE, xlab = NA)
axis(1, at = seq(-3, 3, by=1))
curve(dkde1d(x, fit), n = 300, add = TRUE, col = "green", lwd = 2)
curve(dnorm(x), n = 300, add = TRUE, col = "blue", lwd = 2)

The density can even be evaluated outside the range of the data:

print(dkde1d(max(y)+1, fit))
## [1] 0.001684873

The corresponding cumulative distribution function:

opar <- par(mar = c(4.5, 5, 1, 1))
plot(NULL, xlim = c(-3.5, 3.5), ylim = c(0, 1), axes = FALSE, 
     xlab = "x", ylab = expression("Pr("<="x)"))
axis(1, at = seq(-3, 3, by=1))
axis(2, at = seq(0, 1, by=0.25))
curve(pkde1d(x, fit), n = 300, add = TRUE, col = "green", lwd = 2)
curve(pnorm(x), n = 300, add = TRUE, col = "blue", lwd = 2)

The corresponding inverse cumulative distribution function is evaluated by qkde1d, and rkde1d simulates from the estimated distribution.

Bounded data

By default, the data supplied to the kde1d function is assumed to be unbounded. For bounded data, use the xmin and/or xmax options.

Estimating monotonic densities

Now, something I use to convince my folks that kde1d is great. Consider a distribution having a monotonic density. The base function density does not correctly estimate the density (at least, with the default settings):

set.seed(666)
y <- rbeta(100, 1, 4)
opar <- par(mar = c(3, 1, 1, 1))
plot(NULL, xlim = c(0, 1), ylim = c(0, 4), axes = FALSE, xlab = NA)
axis(1, at = seq(0, 1, by=0.2))
lines(density(y, from = 0, to = 1), col = "green", lwd = 2)
curve(dbeta(x, 1, 4), n = 300, add = TRUE, col = "blue", lwd = 2)

The monotonic aspect of the density does not occur in the estimated density. With kde1d, it does:

fit <- kde1d(y, xmin = 0, xmax = 1)
opar <- par(mar = c(3, 1, 1, 1))
plot(NULL, xlim = c(0, 1), ylim = c(0, 4), axes = FALSE, xlab = NA)
axis(1, at = seq(0, 1, by=0.2))
curve(dkde1d(x, fit), n = 300, add = TRUE, col = "green", lwd = 2)
curve(dbeta(x, 1, 4), n = 300, add = TRUE, col = "blue", lwd = 2)

To leave a comment for the author, please follow the link and comment on their blog: Saturn Elephant.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.