**Statistic on aiR**, and kindly contributed to R-bloggers)

Comparison of the averages of two independent groups of samples, of which we can not assume a distribution of Gaussian type; is also known as Mann-Whitney U-test.

You want to see if the mean of goals suffered by two football teams over the years is the same. Are below the number of goals suffered by each team in 6 games for each year.

Team B: 7, 10, 4, 3, 5, 6

The **Wilcoxon-Matt-Whitney test** (or **Wilcoxon rank sum test**, or **Mann-Whitney U-test**) is used when is asked to compare the means of two groups that do not follow a normal distribution: it is a non-parametrical test. It is the equivalent of the t test, applied for independent samples.

Let’s see how to solve the problem with R:

a = c(6, 8, 2, 4, 4, 5)

b = c(7, 10, 4, 3, 5, 6)

wilcox.test(a,b, correct=FALSE)

Wilcoxon rank sum test

data: a and b

W = 14, p-value = 0.5174

alternative hypothesis: true location shift is not equal to 0

The p-value is greater than 0.05, then we can accept the hypothesis H0 of statistical equality of the means of two groups.

If you run `wilcox.test(b, a, correct = FALSE)`

, the p-value would be logically the same:

a = c(6, 8, 2, 4, 4, 5)

b = c(7, 10, 4, 3, 5, 6)

wilcox.test(b,a, correct=FALSE)

Wilcoxon rank sum test

data: b and a

W = 22, p-value = 0.5174

alternative hypothesis: true location shift is not equal to 0

The value W is so computed:

sum.rank.a = sum(rank(c(a,b))[1:6]) #sum of ranks assigned to the group a

W = sum.rank.a – (length(a)*(length(a)+1)) / 2

W

[1] 14

sum.rank.b = sum(rank(c(a,b))[7:12]) #sum of ranks assigned to the group b

W = sum.rank.b – (length(b)*(length(b)+1)) / 2

W

[1] 22

We can finally compare the intervals tabulated on the tables of Wilcoxon for independent samples. The tabulated interval for two groups of 6 samples each is (26, 52), while the interval of our samples is:

sum(rank(c(a,b))[1:6]) #sum of ranks assigned to the group a

[1] 35

sum(rank(c(a,b))[7:12]) #sum of ranks assigned to the group b

[1] 43

Since the computed interval (35, 43), is contained within the tabulated interval (26, 52), we conclude by accepting the hypothesis H0 of equality of means.

**leave a comment**for the author, please follow the link and comment on their blog:

**Statistic on aiR**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...