Parametric method for the study of the correlation: the Pearson r-test

August 3, 2009
By

(This article was first published on Statistic on aiR, and kindly contributed to R-bloggers)

Suppose you want to study whether there is a correlation between 2 sets of data. To do this we compute the Pearson product-moment correlation coefficient, which is a measure of the correlation (linear dependence) between two variables X and Y; then we compute the value of a t-test to study the significance of the Pearson coefficient R. We can use this test when the data follow a Gaussian distribution.

A new test to measure IQ is subjected to 10 volunteers. You want to see if there is a correlation between the new experimental test and the classical test, in order to replace the old test with the new test. These the values:
Old test: 15, 21, 25, 26, 30, 30, 22, 29, 19, 16
New test: 55, 56, 89, 67, 84, 89, 99, 62, 83, 88

The software R has a single function, easily recalled, which gives us directly the value of the Pearson coefficient and the t-statistical test for checking the significance of the coefficient:

a = c(15, 21, 25, 26, 30, 30, 22, 29, 19, 16)b = c(55, 56, 89, 67, 84, 89, 99, 62, 83, 88)cor.test(a, b)        Pearson's product-moment correlationdata:  a and b t = 0.4772, df = 8, p-value = 0.646alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.5174766  0.7205107 sample estimates:     cor 0.166349

The value of the coefficient of Pearson is 0.166: it is a very low value, which indicates a poor correlation between the variables.
Furthermore, the p-value is greater than 0.05; so we reject the null hypothesis: then the Pearson coefficient is significant.
So we can say that there is no correlation between the results of both tests.