Wilcoxon-Mann-Whitney rank sum test (or test U)

July 27, 2009
By

(This article was first published on Statistic on aiR, and kindly contributed to R-bloggers)

Comparison of the averages of two independent groups of samples, of which we can not assume a distribution of Gaussian type; is also known as Mann-Whitney U-test.

You want to see if the mean of goals suffered by two football teams over the years is the same. Are below the number of goals suffered by each team in 6 games for each year.

Team A: 6, 8, 2, 4, 4, 5
Team B: 7, 10, 4, 3, 5, 6

The Wilcoxon-Matt-Whitney test (or Wilcoxon rank sum test, or Mann-Whitney U-test) is used when is asked to compare the means of two groups that do not follow a normal distribution: it is a non-parametrical test. It is the equivalent of the t test, applied for independent samples.
Let’s see how to solve the problem with R:


a = c(6, 8, 2, 4, 4, 5)
b = c(7, 10, 4, 3, 5, 6)

wilcox.test(a,b, correct=FALSE)

Wilcoxon rank sum test

data: a and b
W = 14, p-value = 0.5174
alternative hypothesis: true location shift is not equal to 0

The p-value is greater than 0.05, then we can accept the hypothesis H0 of statistical equality of the means of two groups.
If you run wilcox.test(b, a, correct = FALSE), the p-value would be logically the same:


a = c(6, 8, 2, 4, 4, 5)
b = c(7, 10, 4, 3, 5, 6)

wilcox.test(b,a, correct=FALSE)

Wilcoxon rank sum test

data: b and a
W = 22, p-value = 0.5174
alternative hypothesis: true location shift is not equal to 0

The value W is so computed:


sum.rank.a = sum(rank(c(a,b))[1:6]) #sum of ranks assigned to the group a
W = sum.rank.a – (length(a)*(length(a)+1)) / 2
W
[1] 14

sum.rank.b = sum(rank(c(a,b))[7:12]) #sum of ranks assigned to the group b
W = sum.rank.b – (length(b)*(length(b)+1)) / 2
W
[1] 22

We can finally compare the intervals tabulated on the tables of Wilcoxon for independent samples. The tabulated interval for two groups of 6 samples each is (26, 52), while the interval of our samples is:


sum(rank(c(a,b))[1:6]) #sum of ranks assigned to the group a
[1] 35
sum(rank(c(a,b))[7:12]) #sum of ranks assigned to the group b
[1] 43

Since the computed interval (35, 43), is contained within the tabulated interval (26, 52), we conclude by accepting the hypothesis H0 of equality of means.

To leave a comment for the author, please follow the link and comment on their blog: Statistic on aiR.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)