(This article was first published on

Comparison of the means of two sets of paired samples, taken from two populations with unknown variance.**Statistic on aiR**, and kindly contributed to R-bloggers)A school athletics has taken a new instructor, and want to test the effectiveness of the new type of training proposed by comparing the average times of 10 runners in the 100 meters. Are below the time in seconds before and after training for each athlete.

Before training: 12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3

After training: 12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1

After training: 12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1

In this case we have two sets of paired samples, since the measurements were made on the same athletes before and after the workout. To see if there was an improvement, deterioration, or if the means of times have remained substantially the same (hypothesis H0), we need to make a

**Student's t-test for paired samples**, proceeding in this way:

a = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3)

b = c(12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1)

t.test(a,b, paired=TRUE)

Paired t-test

data: a and b

t = -0.2133, df = 9, p-value = 0.8358

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-0.5802549 0.4802549

sample estimates:

mean of the differences

-0.05

The p-value is greater than 0.05, then we can accept the hypothesis H0 of equality of the averages. In conclusion, the new training has not made any significant improvement (or deterioration) to the team of athletes.

Similarly, we calculate the t-tabulated value:

qt(0.975, 9)

[1] 2.262157

*t-computed < t-tabulated*, so we accept the null hypothesis H0.

Suppose now that the manager of the team (given the results obtained) fired the coach who has not made any improvement, and take another, more promising. We report the times of athletes after the second training:

Before training: 12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3

After the second training: 12.0, 12.2, 11.2, 13.0, 15.0, 15.8, 12.2, 13.4, 12.9, 11.0

After the second training: 12.0, 12.2, 11.2, 13.0, 15.0, 15.8, 12.2, 13.4, 12.9, 11.0

Now we check if there was actually an improvement, ie perform a

**t-test for paired data**, specifying in R to test the

**alternative hypothesis H1**of improvement in times. To do this simply add the syntax

`alt = "less"`

when you call the t-test:

a = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3)

b = c(12.0, 12.2, 11.2, 13.0, 15.0, 15.8, 12.2, 13.4, 12.9, 11.0)

t.test(a,b, paired=TRUE, alt="less")

Paired t-test

data: a and b

t = 5.2671, df = 9, p-value = 0.9997

alternative hypothesis: true difference in means is less than 0

95 percent confidence interval:

-Inf 2.170325

sample estimates:

mean of the differences

1.61

With this syntax we asked R to check whether the mean of the values contained in the vector

`a`

is less of the mean of the values contained in the vector `b`

. In response, we obtained a p-value well above 0.05, which leads us to conclude that we can reject the null hypothesis H0 in favor of the alternative hypothesis H1: the new training has made substantial improvements to the team.If we had written:

`t.test (a, b, paired = TRUE, alt = "greater")`

, we asked R to check whether the mean of the values contained in the vector `a`

is greater than the mean of the values contained in the vector `b`

. In light of the previous result, we can suspect that the p-value will be much smaller than 0.05, and in fact:

a = c(12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3)

b = c(12.0, 12.2, 11.2, 13.0, 15.0, 15.8, 12.2, 13.4, 12.9, 11.0)

t.test(a,b, paired=TRUE, alt="greater")

Paired t-test

data: a and b

t = 5.2671, df = 9, p-value = 0.0002579

alternative hypothesis: true difference in means is greater than 0

95 percent confidence interval:

1.049675 Inf

sample estimates:

mean of the differences

1.61

To

**leave a comment**for the author, please follow the link and comment on his blog:**Statistic on aiR**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...