# Example 2014.6: Comparing medians and the Wilcoxon rank-sum test

June 12, 2014
By

(This article was first published on SAS and R, and kindly contributed to R-bloggers)

A colleague recently contacted us with the following question: "My outcome is skewed-- how can I compare medians across multiple categories?"

What they were asking for was a generalization of the Wilcoxon rank-sum test (also known as the Mann-Whitney-Wilcoxon test, among other monikers) to more than two groups. For the record, the answer is that the Kruskal-Wallis test is the generalization of the Wilcoxon, the one-way ANOVA to the Wilcoxon's t-test.

But this question is based on a false premise: that the the Wilcoxon rank-sum test is used to compare medians. The premise is based on a misunderstanding of the null hypothesis of the test. The actual null hypothesis is that there is a 50% probability that a random value from one population exceeds an random value from the other population. The practical value of this is hard to see, and thus in many places, including textbooks, the null hypothesis is presented as "the two populations have equal medians". The actual null hypothesis can be expressed as the latter median hypothesis, but only under the additional assumption that the shapes of the distributions are identical in each group.

In other words, our interpretation of the test as comparing the medians of the distributions requires the location-shift-only alternative to be the case. Since this is rarely true, and never assessed, we should probably use extreme caution in using, and especially in interpreting, the Wilcoxon rank-sum test.

To demonstrate this issue, we present a simple simulation, showing a case of two samples with equal medians but very different shapes. In one group, the values are exponential with mean = 1 and therefore median log 2, in the other they are normal with mean and median = log 2 and variance = 1. We generate 10,000 observations and show that the Wilcoxon rank-sum test rejects the null hypothesis. If we interpret the null incorrectly as applying to the medians, we will be misled. If our interest actually centered on the medians for some reasons, an appropriate test that would not be sensitive to the shape of the distribution could be found in a quantile regression. Another, of course, would be the median test. We show that this test does not reject the null hypothesis of equal medians, even with the large sample size.

We leave aside the deeper questions of whether comparing medians is a useful substitute for comparing means, or whether means should not be compared when distributions are skewed.

R
In R, we simulate two separate vectors of data, then feed them directly to the wilcox.test() function (section 2.4.2).
y1 = rexp(10000)y2 = rnorm(10000) + log(2)wilcox.test(y1,y2)
This shows a very small p-value, denoting the fact not that the medians are unequal but that one or the other of these distributions generally has larger values. Hint: it might be the one with the long tail and no negative values.
 Wilcoxon rank sum test with continuity correctiondata:  y1 and y2W = 55318328, p-value < 2.2e-16alternative hypothesis: true location shift is not equal to 0
To set up the quantile regression, we put the observations into a single vector and make a group indicator vector. Then we load the quantreg package and use the rq() function (section 4.4.4) to fit the median regression.
y = c(y1,y2)c = rep(1:2, each = 10000)library(quantreg)summary(rq(y~as.factor(c)))
This shows we would fail to reject the null of equal medians.
Call: rq(formula = y ~ as.factor(c))tau: [1] 0.5Coefficients:              Value    Std. Error t value  Pr(>|t|)(Intercept)    0.68840  0.01067   64.53047  0.00000as.factor(c)2 -0.00167  0.01692   -0.09844  0.92159Warning message:In rq.fit.br(c, y, tau = tau, ...) : Solution may be nonunique

SAS
In SAS, we make a data set with a group indicator and use it to generate data conditionally.
data wtest;do i = 1 to 20000;  c = (i gt 10000);  if c eq 0 then y = ranexp(0);    else y = normal(0) + log(2);  output;  end;run;
The Wilcoxon rank-sum test is in proc npar1way (section 2.4.2).
proc npar1way data = wtest wilcoxon;class c;var y;run;
As with R, we find a very small p-value.
                       The NPAR1WAY Procedure             Wilcoxon Scores (Rank Sums) for Variable y                      Classified by Variable c                     Sum of     Expected      Std Dev         Mean  c          N       Scores     Under H0     Under H0        Score  0      10000    106061416    100005000   408258.497   10606.1416  1      10000     93948584    100005000   408258.497    9394.8584                      Wilcoxon Two-Sample Test                Statistic             106061416.0000                Normal Approximation                Z                            14.8348                One-Sided Pr >  Z             <.0001                Two-Sided Pr > |Z|            <.0001                t Approximation                One-Sided Pr >  Z             <.0001                Two-Sided Pr > |Z|            <.0001             Z includes a continuity correction of 0.5.                        Kruskal-Wallis Test                Chi-Square                  220.0700                DF                                 1                Pr > Chi-Square               <.0001
The median regression is in proc quantreg (section 4.4.4). As in R, we fail to reject the null of equal medians.
proc quantreg data =wtest;class c;model y = c;run;
                       The QUANTREG Procedure                  Quantile and Objective Function              Predicted Value at Mean          0.6909                        Parameter Estimates                            Standard    95% Confidence    Parameter   DF Estimate    Error        Limits       t Value    Intercept    1   0.6909   0.0125    0.6663    0.7155   55.06    c         0  1   0.0165   0.0160   -0.0148    0.0479    1.03    c         1  0   0.0000   0.0000    0.0000    0.0000     .                        Parameter Estimates                        Parameter   Pr > |t|                        Intercept     <.0001                        c         0   0.3016                        c         1    .