A Note on Antoniak’s Approximation for Dirichlet Processes

September 21, 2011

(This article was first published on BioStatMatt » R, and kindly contributed to R-bloggers)

Antoniak’s 1974 article titled Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems (Annals of Statistics 2(6):1152-1174) is a fundamental work for most modern developments in this area. The article gives two expressions for the expected number of distinct values in a sample of size n, drawn from a Dirichlet process-distributed probability distribution with precision parameter \alpha. The first expression is exact, and the second is approximate. However, I recently discovered (the ‘hard way’) that the approximation is poor for \alpha < 1. The following R graphic illustrates the disparity for several values of \alpha and n = 1, \ldots, 10.

R Code

approx <- function(alpha, n) alpha * log((1:n + alpha)/alpha)
exact <- function(alpha, n) cumsum(alpha/(alpha + 1:n - 1))
approx_exact_plot <- function(alp, n, ...) {
     amax <- max(1/alp)
     appmax <- approx(amax, n)
     examax <- exact(amax, n)
     par(mar = c(5, 6, 4, 3) + 0.1)
     plot(appmax, examax, type = "l", xlim = c(0, max(appmax)),
         ylim = c(0, max(examax)),
         main = expression(paste(italic(E)(italic(Z)[italic(n)]), " - Antoniak (1974)")),
         xlab = expression(paste("Approximate - ", alpha, "log", ((n + alpha)/alpha))),
         ylab = expression(paste("Exact - ", sum(alpha/(alpha + m - 1), 1, n))), ...)
     text(max(appmax), max(examax), substitute(paste(alpha, " = ",
         amax)), pos = 2)
     abline(a = 0, b = 1, lty = 2)
     if (length(alp) == 1)
     for (a in alp[-1]) {
         app <- approx(1/a, n)
         exa <- exact(1/a, n)
         points(app, exa, type = "l")
         text(max(app), max(exa), substitute(paste(alpha, " = ",
             1/a)), pos = 2)
 approx_exact_plot(2^(0:4), 10)

To leave a comment for the author, please follow the link and comment on their blog: BioStatMatt » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , ,

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)