Numerical computation of quantiles

October 23, 2013
By

(This article was first published on Thiago G. Martins » R, and kindly contributed to R-bloggers)

Recently I had to define a R function that would compute the {q}-th quantile of a continuous random variable based on an user-defined density function. Since the main objective is to have a general function that computes the quantiles for any user-defined density function it needs be done numerically.

Problem statement

Suppose we are interested in numerically computing the {q}-th quantile of a continuous random variable {X}. Assume for now that {X} is defined on {\Re = (-\infty, \infty)}. So, the problem is to find {x^*} such that {x^* = \{x : F_X(x) = q\}}, where {F_X(x)} is the cumulative distribution function of {X},

\displaystyle F_X(x) = \int _{-\infty} ^ {x} f_X(x) dx

and {f_X(x)} is its density function.

Optimization

This problem can be cast into an optimization problem,

\displaystyle x^* = \underset{x}{\text{arg min}} (F_X(x) - q)^2

That said, all we need is a numerical optimization routine to find {x} that minimize {(F_X(x) - q)^2} and possibly a numerical integration routine to compute {F_X(x)} in case it is not available in closed form.

Below is one possible implementation of this strategy in R, using the integrate function for numerical integration and nlminb for the numerical optimization.

CDF <- function(x, dist){
  integrate(f=dist,
            lower=-Inf,
            upper=x)$value
}
objective <- function(x, quantile, dist){
  (CDF(x, dist) - quantile)^2
}
find_quantile <- function(dist, quantile){
  result = nlminb(start=0, objective=objective,
                  quantile = quantile,
                  dist = dist)$par
  return(result)
}

and we can test if everything is working as it should using the following
simple task of computing the 95th-quantile of the standard Gaussian distribution.

std_gaussian <- function(x){
  dnorm(x, mean = 0, sd = 1)
}
find_quantile(dist = std_gaussian,
              quantile = 0.95)
[1] 1.644854
qnorm(0.95)
[1] 1.644854

Random variables with bounded domain

If {X} has a bounded domain, the optimization problem above becomes more tricky and unstable. To facilitate the numerical computations we can transform {X} using a monotonic and well behaved function {g}, so that {Y = g(X)} is defined on the real line. Then we apply the numerical solution given above to compute the {q}-th quantile of {Y}, say {y^*}, and then transform the quantile back to the original scale {x^* = g^{-1}(y^*)}. Finally, {x^*} is the {q}-th quantile of {X} that we were looking for.

For example, if {X = [0, \infty)} we could use {Y = \log(X)}.

To leave a comment for the author, please follow the link and comment on their blog: Thiago G. Martins » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)