R Tutorial Series: Zero-Order Correlations

[This article was first published on R Tutorial Series, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the most common and basic techniques for analyzing the relationships between variables is zero-order correlation. This tutorial will explore the ways in which R can be used to employ this method.

Tutorial Files

Before we start, you may want to download the sample data (.csv) used in this tutorial. Be sure to right-click and save the file to your R working directory. This dataset contains pre and post test scores for 66 subjects on a series of reading comprehension tests (Moore & McCabe, 1989). Note that all code samples in this tutorial assume that this data has already been read into an R variable and has been attached.

Correlation Between Two Variables

The most fundamental way to calculate correlations is to directly operate on two variables. In R, this can be done using the cor() function. The cor() function accepts the following arguments (“Correlation, Variance…”, n.d.).

  • x: the first variable to correlate
  • y: the second variable to correlate
  • use (optional): determines how missing values are handled; accepts “all.obs”, “complete.obs”, or “pairwise.complete.obs”
  • method (optional): determines the statistical method used; accepts c(“pearson”), c(“kendall”), or c(“spearman”)

In most cases, x and y are the only arguments that you will use when running the cor() function. The basic format for calculating a correlation is cor(VAR1, VAR2), where VAR1 and VAR2 are the variables that you would like to correlate.

cor(VAR1, VAR2) Example

Suppose that our research question is: “How does a subject’s pretest 1 score relate to his or her posttest 1 score?” The following example demonstrates how to use the cor() function to calculate the correlation between pretest 1 (PRE1) and posttest 1 (POST1).

  1. >#use cor(VAR1, VAR2) to calculate the correlation between variable 1 and variable 2
  2. > cor(PRE1, POST1)
  3. [1] 0.5659026

Correlations Between Multiple Variables

When beginning to analyze a dataset, researchers often want to get a complete picture of all correlations, rather than just a single one. Conveniently, the cor() function can also be run on an entire set of data. The format for this operation is cor(DATAVAR), where DATAVAR is the name of the R variable containing the data.

cor(DATAVAR) Example

Suppose now that our research question is: “How do all of the test scores in the dataset relate to each other?” The following example demonstrates how to use the cor() function to calculate all of the correlations in a dataset.

  1. >#use cor(DATAVAR) to get the correlations between all variables
  2. > cor(datavar)

The output of the preceding function is pictured below.

Complete Correlational Analysis

To see a complete example of how correlational analysis can be conducted in R, please download the correlational analysis example (.txt) file.

References

Correlation, Variance and Covariance (Matrices). (n.d.). Retrieved October, 27, 2009 from http://sekhon.berkeley.edu/stats/html/cor.html

Moore, D., and McCabe, G. (1989). Introduction to the practice of statistics [Data File]. Retrieved October, 27, 2009 from http://lib.stat.cmu.edu/DASL/Datafiles/ReadingTestScores.html

To leave a comment for the author, please follow the link and comment on their blog: R Tutorial Series.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)