This post provides links to a range of resources related to the use and interpretation of correlations. I wanted to provide a page with links to a number of additional resources that would be useful both for those of my students who might be keen to learn more and for anyone else who might be interested. Specifically, this post provides links to: (a) introductory book-style chapters on correlation, (b) resources related to assorted issues in correlation (i.e., discussion of causal inference, correlation with various variable types, range restriction, statistical power, correlation interpretation, and significance testing), (c) tutorials on computing correlations using SPSS and R, and (d) tips for reporting correlations in APA Style.
Introductions to correlation
The following provide general textbook style overviews of correlation:
- David Kenny's Chapter 16 Testing Measures of Association provides a textbook overview of correlation designed for psychology undergraduate students. It also includes several practice questions. David Kenny has kindly made his entire textbook 'Statistics for the Social and Behavioral Sciences' available online for free as either an overall pdf or individual chapters.
- David Stockburger's Introductory Statistics chapter on Correlation
- My own slides and notes on correlation
Correlation and Causation
Knowing how to reason about causality in the behavioural and social sciences is a really important skill.
- Check out this earlier post on correlation and causationwhich includes links to PDFs of important journal articles on the topic.
- Joy of Stats on Correlationprovides a 4 minute video with a few entertaining examples of correlations and their connection with causal inference.
Types of variables
The prototypical correlation example is based on two continuous, normally distributed variables. However, in practice there are many other types of variables that you might wish to correlate. The following provide pages provide links to suggestions for how to analyse some other common scenarios:
- What to do when one of the variables is non-normal?
- What to do when one of the variables is a Likert item?
- What to do if you want to treat a variable as ordinal?
- HyperStat has a general discussion of range restriction
- See this simulation on connexions showing the effect of range restriction
Statistical power within the context of correlation is the probability of obtaining a statistically significant correlation in a study given that a true correlation exists.
- This earlier postprovides (a) some simple rules of thumb for power analysis for correlations, (b) how to calculate statistical power using free software called G-Power, and (c) links to additional reading on the important topic of statistical power.
When I first learnt about the correlation coefficient, I found it challenging to truly grok what a particular value meant. Learning the standard interpretation was easy. The challenging part was understanding the practical and theoretical implications for a correlation of a given size.
The following are some of the standard interpretations of a correlation:
- Pearson's correlation is an index of the direction and strength of linear association between two variables.
- The square of the correlation between X and Y is the percentage of variance shared between X and Y (e.g., if
r = .50, then the two variables share
.50 * .50 = 25%of variance).
- If X and Y were standardised (i.e., made so that the mean of both variables was zero and the standard deviation was one) then, the correlation would be the same as the regression coefficient of X predicting Y or Y predicting X. Thus, for example, if
r = .25you could say that "a value one standard deviation greater on X predicts a .25 standard deviation greater value on Y".
Strategies for building an intuition of what a correlation means:
- Play with the Regression by Eye simulation. The simulation generates a scatterplot, and you are asked to indicate which of a set of correlations corresponds to the scatterplot. It helps to build a mapping between the graphical intuitiveness of a scatterplot and the numeric summary of the linear association in the scatterplot (i.e., the correlation coefficient).
- Memorise some of the rules of thumbs for describing correlation effect sizes (see this discussion by Andy Field), but don't take the rules of thumb too seriously.
- Try to build up a frame of reference for correlations in different contexts by reading results sections. Meta analyses can also be particularly useful in this regard.
- Read the article 'Meyer, G. J., et al (2001). Psychological Testing and Psychological Assessment: A Review of Evidence and Issues. American Psychologist, 56(2), 128-165.' (PDF) which provides large tables of meta-analytic correlations for a wide range of medical and psychological domains sorted by the size of the correlation. Studying these tables can help build an intuition and a context for interpretation of correlations.
As with most statistical techniques, there are various ways of representing the data. The correlation coefficient provides a very brief summary of the association between two variables. However, graphical representations of association are much richer.
The following are some general heuristics that I find useful when plotting data that might also be represented as a correlation:
- Use scatterplots to explore features of the association (e.g., presence of outliers, linearity, distributional properties, spread of data around any trend line, etc.);
- If one of the variables is positively skewed, consider plotting the corresponding axis on a log scale;
- If there are a lot of data points (e.g.,
n > 1000), adopt a different strategy such as using some form of partial transparency (e.g., see use of the alpha property in ggplot2), or sampling the data;
- If one of the variables takes on a limited number of discrete categories, consider using a jitter or a sunflower plot;
- If there are three or more variables, consider using a scatterplot matrix;
- Fitting some form of trend line is often useful;
- Adjust the size of the plotting character to the sample size (for bigger n, use a smaller plotting character).
Significance tests on correlations
There are a wide range of possible significance tests that can be performed on correlations. The following links provide some suggestions and links for different scenarios.
- General post on comparing significance of two correlationsunder various conditions.
- Significance of correlation using Pearson's table
Calculating a correlation coefficient and its associated statistical significance is a standard task that almost any statistical package can perform. Many psychology students are taught to use SPSS. It is a proprietary (i.e., you can't run it at home without a paid licence) data analysis system with a strong empahsis on a GUI and making it easy to perform various standardised analyses common in the social sciences.
My preferred tool for performing data analysis is R. It is open source (thus, you can run it at home for free) and is often described as the lingua franca of statistics. It generally requires a more sophisticated understanding of statistics and computing to use effectively. Thus, for the interested psychology student or researcher I have this introduction to R for researchers in psychology.
Below I list resources for performing correlation analysis in SPSS and R.
- Andy Field has a chapter on correlationwhich discusses correlation using SPSS.
- This video tutorial on running and interpreting a correlation analysis using SPSS goes for about 7 minutes and is elementary.
R makes it easy to perform correlations on datasets. Specifically, the following links provide example syntax:
- Quick-R on correlations
- Quick-R on scatterplots
- More generally, William Revelle has some great resources on R for psychology.
Reporting Correlations in APA Style
- APA Style Manual: When required to report results using APA style, the authoritative source is the Publication Manual of the APA.
- Article Deconstruction: Another general strategy is to find a journal article that (a) reports a similar statistical test as you require, and (b) that is published in an APA journal or at least is in a journal that uses APA style.
- APA journals are listed here
- A quick search on Google Scholar will often be sufficient and quicker, although PsycInfo (a subscription service) is more reliable if you have access to it (many universities do). E.g., a quick search for apa "significant correlation between" psychologyrevealed several relevant articles and some with immediate PDF access.
- I also have a separate post on this general approach of deconstructing journal articlesto discern writing principles.
- Correlation Matrices: Many psychological studies, particularly those based on correlational/observational designs, involve the measurement of a range of numeric variables. It is particularly useful, and common, in such cases to report a correlation matrix between sets of variables. I have a post with instructions on formatting a correlation matrixin APA style using a combination of SPSS, Excel, and Word. The post also includes links to examples of correlation matrices being reported.
- General overview of reporting statistics including correlations using APA style