R Tutorial Series: Zero-Order Correlations

November 6, 2009
By

(This article was first published on R Tutorial Series, and kindly contributed to R-bloggers)

One of the most common and basic techniques for analyzing the relationships between variables is zero-order correlation. This tutorial will explore the ways in which R can be used to employ this method.

Tutorial Files

Before we start, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains pre and post test scores for 66 subjects on a series of reading comprehension tests (Moore & McCabe, 1989). Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.

Correlation Between Two Variables

The most fundamental way to calculate correlations is to directly operate on two variables. In R, this can be done using the cor() function. The cor() function accepts the following arguments ("Correlation, Variance...", n.d.).

  • x: the first variable to correlate
  • y: the second variable to correlate
  • use (optional): determines how missing values are handled; accepts "all.obs", "complete.obs", or "pairwise.complete.obs"
  • method (optional): determines the statistical method used; accepts c("pearson"), c("kendall"), or c("spearman")

In most cases, x and y are the only arguments that you will use when running the cor() function. The basic format for calculating a correlation is cor(VAR1, VAR2), where VAR1 and VAR2 are the variables that you would like to correlate.

cor(VAR1, VAR2) Example

Suppose that our research question is: "How does a subject's pretest 1 score relate to his or her posttest 1 score?" The following example demonstrates how to use the cor() function to calculate the correlation between pretest 1 (PRE1) and posttest 1 (POST1).

  1. >#use cor(VAR1, VAR2) to calculate the correlation between variable 1 and variable 2
  2. > cor(PRE1, POST1)
  3. [1] 0.5659026

Correlations Between Multiple Variables

When beginning to analyze a dataset, researchers often want to get a complete picture of all correlations, rather than just a single one. Conveniently, the cor() function can also be run on an entire set of data. The format for this operation is cor(DATAVAR), where DATAVAR is the name of the R variable containing the data.

cor(DATAVAR) Example

Suppose now that our research question is: "How do all of the test scores in the dataset relate to each other?" The following example demonstrates how to use the cor() function to calculate all of the correlations in a dataset.

  1. >#use cor(DATAVAR) to get the correlations between all variables
  2. > cor(datavar)

The output of the preceding function is pictured below.

Complete Correlational Analysis

To see a complete example of how correlational analysis can be conducted in R, please download the correlational analysis example (.txt) file.

References

Correlation, Variance and Covariance (Matrices). (n.d.). Retrieved October, 27, 2009 from http://sekhon.berkeley.edu/stats/html/cor.html

Moore, D., and McCabe, G. (1989). Introduction to the practice of statistics [Data File]. Retrieved October, 27, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/ReadingTestScores.html

To leave a comment for the author, please follow the link and comment on his blog: R Tutorial Series.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , ,

Comments are closed.