Numerical computation of quantiles

[This article was first published on Thiago G. Martins » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently I had to define a R function that would compute the {q}-th quantile of a continuous random variable based on an user-defined density function. Since the main objective is to have a general function that computes the quantiles for any user-defined density function it needs be done numerically.

Problem statement

Suppose we are interested in numerically computing the {q}-th quantile of a continuous random variable {X}. Assume for now that {X} is defined on {\Re = (-\infty, \infty)}. So, the problem is to find {x^*} such that {x^* = \{x : F_X(x) = q\}}, where {F_X(x)} is the cumulative distribution function of {X},

\displaystyle F_X(x) = \int _{-\infty} ^ {x} f_X(x) dx

and {f_X(x)} is its density function.

Optimization

This problem can be cast into an optimization problem,

\displaystyle x^* = \underset{x}{\text{arg min}} (F_X(x) - q)^2

That said, all we need is a numerical optimization routine to find {x} that minimize {(F_X(x) - q)^2} and possibly a numerical integration routine to compute {F_X(x)} in case it is not available in closed form.

Below is one possible implementation of this strategy in R, using the integrate function for numerical integration and nlminb for the numerical optimization.

CDF <- function(x, dist){
  integrate(f=dist,
            lower=-Inf,
            upper=x)$value
}
objective <- function(x, quantile, dist){
  (CDF(x, dist) - quantile)^2
}
find_quantile <- function(dist, quantile){
  result = nlminb(start=0, objective=objective,
                  quantile = quantile,
                  dist = dist)$par
  return(result)
}

and we can test if everything is working as it should using the following
simple task of computing the 95th-quantile of the standard Gaussian distribution.

std_gaussian <- function(x){
  dnorm(x, mean = 0, sd = 1)
}
find_quantile(dist = std_gaussian,
              quantile = 0.95)
[1] 1.644854
qnorm(0.95)
[1] 1.644854

Random variables with bounded domain

If {X} has a bounded domain, the optimization problem above becomes more tricky and unstable. To facilitate the numerical computations we can transform {X} using a monotonic and well behaved function {g}, so that {Y = g(X)} is defined on the real line. Then we apply the numerical solution given above to compute the {q}-th quantile of {Y}, say {y^*}, and then transform the quantile back to the original scale {x^* = g^{-1}(y^*)}. Finally, {x^*} is the {q}-th quantile of {X} that we were looking for.

For example, if {X = [0, \infty)} we could use {Y = \log(X)}.

To leave a comment for the author, please follow the link and comment on their blog: Thiago G. Martins » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)