(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)
A standard idea in extreme value theory (see e.g. here,
in French unfortunately) is that to estimate the 99.5%
quantile (say), we just need to estimate a quantile of level 95% for
observations
exceeding the 90% quantile.
In extreme value theory, we assume that the 90% quantile (of the initial distribution) can be obtained easily, e.g. the empirical quantile, and then, for the exceeding observations, we fit a Pareto distribution (a Generalized Pareto one to be precise), and get a parametric quantile for the 95% quantile. I.e.





given
,
based on observations
's,
but all
observations such that
for
some
are
missing. More precisely,
I have the following sample (here half of the observations are missing),
quantile of level 25%, and above the
quantile of level 75%.If I want to get the 90% quantile regression, and the 10% quantile, the code is simply,
library(mnormt)
library(quantreg)
library(splines)
set.seed(1)
mu=c(0,0)
r=0
Sigma <- matrix(c(1,r,r,1), 2, 2)
Z=rmnorm(2500,mu,Sigma)
X=Z[,1]
Y=Z[,2]
base=data.frame(X,Y)
plot(X,Y,col="blue",cex=.7)
I=(Y>qnorm(.25))&(Y<qnorm(.75))
baseI=base[I==FALSE,]
points(X[I],Y[I],col="light blue",cex=.7)
abline(h=qnorm(.25),lty=2,col="blue")
abline(h=qnorm(.75),lty=2,col="blue")
u=seq(-5,5,by=.02)
reg=rq(Y~X,data=base,tau=.05)
lines(u,predict(reg,newdata=data.frame(X=u)),lty=2)
reg=rq(Y~X,data=baseI,tau=.05*2)
lines(u,predict(reg,newdata=data.frame(X=u)))


But what if observations
and
were correlated ? Consider a Gaussian random vector
with
correlation
(here 0.6).
But why could that be interesting ? Well, because I wanted to run a quantile regression on marathon results. But I could not get the overall dataset (since I had to import observations manually, and I have to admit that it was a bit boring). So I extracted finish times of the first 10% athletes, and the latest 10%. And I was wondering if it was enough to look at the 5% and 95% quantiles, based on the age of the runner... To be continued.
To leave a comment for the author, please follow the link and comment on his blog: Freakonometrics - Tag - R-english.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).