Comparing quantiles for two samples

Posted on March 8, 2013 by arthur charpentier in Uncategorized | 0 Comments

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Recently, for a research paper, I some samples, and I wanted to compare them. Not to compare they means (by construction, all of them were centered) but there dispersion. And not they variance, but more their quantiles. Consider the following boxplot type function, where everything here is quantile related (which is not the case for standard boxplot, see http://freakonometrics.hypotheses.org/4138, in French)

> boxplotqbased=function(x){
+ q=quantile(x[is.na(x)==FALSE],c(.05,.25,.5,.75,.95))
+ plot(1,1,col="white",axes=FALSE,xlab="",ylab="",
+ xlim=range(X),ylim=c(1-.6,1+.6))
+ polygon(c(q[2],q[2],q[4],q[4]),1+c(-.4,.4,.4,-.4))
+ segments(q[1],1-.4,q[1],1+.4)
+ segments(q[5],1,q[4],1)
+ segments(q[5],1-.4,q[5],1+.4)
+ segments(q[1],1,q[2],1)
+ segments(q[3],1-.4,q[3],1+.4,lwd=2)
+ xt=x[(x<q[1])|(x>q[5])]
+ points(xt,rep(1,length(xt)))
+ axis(1)
+ }

(one can easily adapt the code for lists, e.g.). Consider for instance temperature, when the (linear) trend is removed (see http://freakonometrics.hypotheses.org/1016 for a discussion on that series, in Paris),

from January 1st till December 31st. Let us remove now the seasonal cycle, i.e. we do have here the difference with the average seasonal temperature (with here upper and lower quantiles),

Seasonal boxplots are here (with Autumn on top, then Summer, Spring and Winter, below),

If we zoom in, we do have (where upper and lower segments are 95% and 5% quantiles, while classically, boxes are related to the 75% and the 25% quantiles)

Is there a (standard) test to compare quantiles – some of them perhaps ? Can we compare easily quantiles when have two (or more) samples ?

Note that this example on temperature could be related to other old posts (see e.g. http://freakonometrics.hypotheses.org/2190), but the research paper was on a very different topic.

Consider two (i.i.d.) samples $http://latex.codecogs.com/gif.latex?\{x_1,\cdots,x_m\}$ and $http://latex.codecogs.com/gif.latex?\{y_1,\cdots,y_n\}$ , considered as realizations of random variables $http://latex.codecogs.com/gif.latex?X$ and $http://latex.codecogs.com/gif.latex?Y$ . In all statistical courses, tests on the average are always considered, i.e.

$http://latex.codecogs.com/gif.latex?H_0:\mathbb{E}(X)=\mathbb{E}(Y)$

against

$http://latex.codecogs.com/gif.latex?H_1:\mathbb{E}(X)\neq\mathbb{E}(Y)$

Usually, the idea in courses is to start with a one sample test, and to test something like

$http://latex.codecogs.com/gif.latex?H_0:\mathbb{E}(X)=\mu_\star$

against

$http://latex.codecogs.com/gif.latex?H_1:\mathbb{E}(X)\neq\mu_\star$

The idea is to assume that samples are from Gaussian variables,

$http://latex.codecogs.com/gif.latex%20?%20%20%20%20%20T%20=%20\frac{\overline{x}%20-%20\mu_\star}{\widehat{\sigma}/\sqrt{n}}$
Under $http://latex.codecogs.com/gif.latex?H_0$ , $http://latex.codecogs.com/gif.latex%20?%20%20%20%20%20T$ has a Student t distribution. All that can be found in any Statistics 101 course. We can derive $http://latex.codecogs.com/gif.latex%20?%20%20%20%20%20p$ –value, computing probabilities that $http://latex.codecogs.com/gif.latex%20?%20%20%20%20%20T$ exceeds the observed values (for two sided tests, the probability that the absolute value of $http://latex.codecogs.com/gif.latex%20?%20%20%20%20%20T$ exceed the absolute value of the observed statistics). This test is closely related to the construction of confidence intervals for $http://latex.codecogs.com/gif.latex%20?%20%20%20%20%20\mu$ . If $http://latex.codecogs.com/gif.latex%20?%20%20%20%20%20\mu_\star$ belongs to the confidence interval, then it might be a suitable value. The graphical representation of this test is related to the following graph

Here the observed value was 1,96, i.e. the $http://latex.codecogs.com/gif.latex%20?%20%20%20%20%20p$ –value (the area in red above) is exactly 5%.

To compare means, the standard test is based on

$http://latex.codecogs.com/gif.latex?T%20=%20{\overline{x}%20-%20\overline{y}%20\over%20%20\displaystyle\sqrt{{s_x^2%20\over%20m}%20+%20{s_y^2%20\over%20n}}%20}$

which has – under $http://latex.codecogs.com/gif.latex?H_0$ – a Student-t distribution, with $http://latex.codecogs.com/gif.latex%20?%20%20%20%20%20\nu$ degrees of freedom, where

$http://latex.codecogs.com/gif.latex?\nu%20=%20\frac{(s_x^2/m%20+%20s_y^2/n)^2}{(s_x^2/m)^2/(m-1)%20+%20(s_y^2/n)^2/(n-1)}.$

Here, the graphical representation is the following,

But tests on quantiles are rarely considered in statistical courses. In a general setting,define quantiles as

$http://latex.codecogs.com/gif.latex?Q_X(p)=\inf\left\{%20x\in%20\mathbb%20R%20:%20p%20\le%20\mathbb%20P(X\leq%20x)%20\right\}$

one might be interested to test
$http://latex.codecogs.com/gif.latex?H_0:Q_X(p)=Q_Y(p)$
against
$http://latex.codecogs.com/gif.latex?H_1:Q_X(p)\neq%20Q_Y(p)$
for some $http://latex.codecogs.com/gif.latex?p\in(0,1)$ . Note that we might be interested also to test if

$http://latex.codecogs.com/gif.latex?H_0:Q_X(p_k)=%20Q_Y(p_k)$
for all $http://latex.codecogs.com/gif.latex%20?%20%20k$ , for some vector of probabilities $http://latex.codecogs.com/gif.latex?\boldsymbol{p}=(p_1,\cdots,p_d)\in(0,1)^d$ .
One can imagine that this multiple test will be more complex. But more interesting, e.g. a test on boxplots (are the four quantiles equal ?). Let us start with something a bit more simple: a test on quantiles for one sameple, and the derivation of a confidence interval for quantiles.

Quantiles for one sample

The important idea here is that it should be extremely simple to get $http://latex.codecogs.com/gif.latex?p$ –values. Consider the following sample, and let us run a test to assess if the median can be zero.

> set.seed(1)
> X=rnorm(20)
> sort(X)
[1] -2.21469989 -0.83562861 -0.82046838 -0.62645381 -0.62124058 -0.30538839
[7] -0.04493361 -0.01619026  0.18364332  0.32950777  0.38984324  0.48742905
[13]  0.57578135  0.59390132  0.73832471  0.82122120  0.94383621  1.12493092
[19]  1.51178117  1.59528080
> sum(X<=0)
[1] 8

Here, 8 observations (out of 20, i.e. 40%) were below zero. But we do know the distribution of $http://latex.codecogs.com/gif.latex%20?%20%20N$ the number of observation below the target

$http://latex.codecogs.com/gif.latex?N=\sum_{i=1}^n%20\boldsymbol{1}(X_i\leq%20x_\star)$

It is a binomial distribution. Under $http://latex.codecogs.com/gif.latex?H_0$ , it is a binomial distribution $http://latex.codecogs.com/gif.latex?\mathcal{B}(n,p_\star)$ where $http://latex.codecogs.com/gif.latex?p_\star$ is the probability target (here 50% since the test is on the median). Thus, one can easily compute the $http://latex.codecogs.com/gif.latex%20?%20%20%20%20%20p$ -value,

> plot(n,dbinom(n,size=20,prob=0.50),type="s",xlab="",ylab="",col="white")
> abline(v=sum(X<=0),col="red")
> for(i in 1:sum(X<=0)){
+ polygon(c(n[i],n[i],n[i+1],n[i+1]),
+ c(0,rep(dbinom(n[i],size=20,prob=0.50),2),0),col="red",border=NA)
+ polygon(21-c(n[i],n[i],n[i+1],n[i+1]),
+ c(0,rep(dbinom(n[i],size=20,prob=0.50),2),0),col="red",border=NA)
+ }
> lines(n,dbinom(n,size=20,prob=0.50),type="s")

which yields

Here, the $http://latex.codecogs.com/gif.latex%20?%20%20%20%20%20p$ -value is

> 2*pbinom(sum(X<=0),20,.5)
[1] 0.5034447

Here the probability is easy to compute. But one can observe that there is some kind of disymmetry here. Actually, if the observed value was not 8, but 12, some minor changes should be done (to keep some symmetry),

> plot(n,dbinom(n,size=20,prob=0.50),type="s",xlab="",ylab="",col="grey")
> abline(v=20-sum(X<=0),col="red")
> for(i in 1:sum(X<=0)){
+ polygon(c(n[i],n[i],n[i+1],n[i+1])-1,
+ c(0,rep(dbinom(n[i],size=20,prob=0.50),2),0),col="red",border=NA)
+ polygon(21-c(n[i],n[i],n[i+1],n[i+1])-1,
+ c(0,rep(dbinom(n[i],size=20,prob=0.50),2),0),col="red",border=NA)
+ }
> lines(n-1,dbinom(n,size=20,prob=0.50),type="s")

Based on those observations, one can easily write a code to test if the $http://latex.codecogs.com/gif.latex?p_\star$ -quantile of a sample is $http://latex.codecogs.com/gif.latex?x_\star$ . Or not. For a two sided test, consider

> quantile.test=function(x,xstar=0,pstar=.5){
+ n=length(x)
+ T1=sum(x<=xstar)
+ T2=sum(x< xstar)
+ p.value=2*min(1-pbinom(T2-1,n,pstar),pbinom(T1,n,pstar))
+ return(p.value)}

Here, we have

> quantile.test(X)
[1] 0.5034447

Now, based on that idea, due to the duality between confidence intervals and tests, one can easily write a function that computes confidence interval for quantiles,

> quantile.interval=function(x,pstar=.5,conf.level=.95){
+ n=length(x)
+ alpha=1-conf.level
+ r=qbinom(alpha/2,n,pstar)
+ alpha1=pbinom(r-1,n,pstar)
+ s=qbinom(1-alpha/2,n,pstar)+1
+ alpha2=1-pbinom(s-1,n,pstar)
+ c.lower=sort(x)[r]
+ c.upper=sort(x)[s]
+ conf.level=1-alpha1-alpha2
+ return(list(interval=c(c.lower,c.upper),confidence=conf.level))}
> quantile.interval(X,.50,.95)
$interval
[1] -0.3053884  0.7383247

$confidence
[1] 0.9586105

Because of the use of non-asymptotic distributions, we can not get exactly a 95% confidence interval. But it is not that bad, here.

Comparing quantiles for two samples

Now, to compare quantiles for two samples… it is more complicated. Exact tests are discussed in Kosorok (1999) (see http://bios.unc.edu/~kosorok/…) or in Li, Tiwari and Wells (1996) (see http://jstor.org/…). For the computational aspects, as mentioned in a post published almost one year ago on http://nicebread.de/… there is a function to compare quantiles for two samples.

> install.packages("WRS")
> library("WRS")

Some multiple tests on quantiles can be performed here. For instance, on the temperature, if we compare quantiles for Winter and Summer (on only 1,000 observations since it can be long to run that function), i.e. 5%, 25%, 75% and 95%,

> qcomhd(Z1[1:1000],Z2[1:1000],q=c(.05,.25,.75,.95))
q   n1   n2      est.1      est.2 est.1_minus_est.2     ci.low     ci.up     p_crit p.value signif
1 0.05 1000 1000 -6.9414084 -6.3312131       -0.61019530 -1.6061097 0.3599339 0.01250000   0.220     NO
2 0.25 1000 1000 -3.3893867 -3.1629541       -0.22643261 -0.6123292 0.2085305 0.01666667   0.322     NO
3 0.75 1000 1000  0.5832394  0.7324498       -0.14921041 -0.4606231 0.1689775 0.02500000   0.338     NO
4 0.95 1000 1000  3.7026388  3.6669997        0.03563914 -0.5078507 0.6067754 0.05000000   0.881     NO

or if we compare quantiles for Winter and Summer

> qcomhd(Z1[1:1000],Z3[1:1000],q=c(.05,.25,.75,.95))
q   n1  n2      est.1     est.2 est.1_minus_est.2     ci.low       ci.up     p_crit p.value signif
1 0.05 1000 984 -6.9414084 -6.438318        -0.5030906 -1.3748624  0.39391035 0.02500000   0.278     NO
2 0.25 1000 984 -3.3893867 -3.073818        -0.3155683 -0.7359727  0.06766466 0.01666667   0.103     NO
3 0.75 1000 984  0.5832394  1.010454        -0.4272150 -0.7222362 -0.11997409 0.01250000   0.012    YES
4 0.95 1000 984  3.7026388  3.873347        -0.1707078 -0.7726564  0.37160846 0.05000000   0.539     NO

(the following graphs are then plotted)

Those tests are based on the procedure proposed in Wilcox, Erceg-Hurn, Clark and Carlson (2013), online on http://tandfonline.com/…. They rely on the use of bootstrap samples. The idea is quite simple actually (even if, in the paper, they use Harrell–Davis estimator to estimate quantiles, i.e. a weighted sum of ordered statistics – as described in http://freakonometrics.hypotheses.org/1755 – but the idea can be understood with any estimator): we generate several bootstrap samples, and compute the median for all of them (since our interest was initially on the median)

>  Q=rep(NA,10000)
>  for(b in 1:10000){
+  Q[b]=quantile(sample(X,size=20,replace=TRUE),.50)
+  }

Then, to derive a confidence interval (with, say, 95% confidence), we compute quantiles of those median estimates,

> quantile(Q,c(.025,.975))
     2.5%     97.5% 
-0.175161  0.666113

We can actually visualize the distribution of that bootstrap median,

> hist(Q)

Now, if we want to compare medians from two independent samples, the strategy is rather similar: we bootstrap the two samples – independently – then compute the median, and keep in mind the difference. Then, we will look if the difference is significantly different from 0. E.g.

> set.seed(2)
> Y=rnorm(50,.6)
> QX=QY=D=rep(NA,10000)
> for(b in 1:10000){
+ QX[b]=quantile(sample(X,size=length(X),replace=TRUE),.50)
+ QY[b]=quantile(sample(Y,size=length(Y),replace=TRUE),.50)
+ D[b]=QY[b]-QX[b]
+ }

The 95% confidence interval obtained from the bootstrap difference is

> quantile(D,c(.025,.975))
      2.5%      97.5% 
-0.2248471  0.9204888

which is rather close to was can be obtained with the R function

> qcomhd(X,Y,q=.5)
    q n1 n2    est.1     est.2 est.1_minus_est.2    ci.low     ci.up p_crit p.value signif
1 0.5 20 50 0.318022 0.5958735        -0.2778515 -0.923871 0.1843839   0.05    0.27     NO

(where the difference is here the oppositive of mine). And when testing for 2 (or more) quantiles, Bonferroni method can be used to take into account that those tests cannot be considered as independent.

Arthur Charpentier

Arthur Charpentier, professor in Montréal, in Actuarial Science. Former professor-assistant at ENSAE Paristech, associate professor at Ecole Polytechnique and assistant professor in Economics at Université de Rennes 1. Graduated from ENSAE, Master in Mathematical Economics (Paris Dauphine), PhD in Mathematics (KU Leuven), and Fellow of the French Institute of Actuaries.