R Tutorial Series: Applying the Reshape Package to Organize ANOVA Data

March 14, 2011
By

(This article was first published on R Tutorial Series, and kindly contributed to R-bloggers)

As demonstrated in the preceding ANOVA tutorials, data organization is central to conducting ANOVA in R. In standard ANOVA, we used the tapply() function to generate a table for a single summary function. In repeated measures ANOVA, we used separate datasets for our omnibus ANOVA and follow-up comparisons. This tutorial will demonstrate how the reshape package can be used to simplify the ANOVA data organization process in R.

Tutorial Files

Before we begin, you may want to download the between group and repeated measures datasets (.csv) used in this tutorial. Be sure to right-click and save the files to your R working directory. The between groups dataset contains a hypothetical sample of 30 cases separated into three groups (a, b, and c). The repeated measures dataset contains a hypothetical sample of 10 cases across three measurements (a, b, and c). In both cases, the values are represented on a scale that ranges from 1 to 5.

Beginning Steps

To begin, we need to read our datasets into R and store their contents in variables.
  1. > #read the datasets into R variables using the read.csv(file) function
  2. > dataBetween <- read.csv("dataset_ANOVA_reshape_1.csv")
  3. > dataRepeated <- read.csv("dataset_ANOVA_reshape_2.csv")

Reshape Package

Next, we need to install and load the reshape package. In this tutorial, we will make use of the package's cast() and melt() functions.
  1. > #install the package
  2. > install.packages("reshape")
  3. > #load the package
  4. > library(reshape)

Using cast() to Derive ANOVA Descriptives

The cast() function can be used to easily derive summary statistics for a between groups ANOVA dataset. The cast() function receives the following primary arguments.
  • data: the dataset
  • formula: in our case, a one-sided formula indicating the grouping variable
  • fun.aggregate: a function or vector of functions for deriving summary statistics, such as mean, var, or sd
  1. > #display the raw between groups data
  2. > dataBetween

The raw between groups data
  1. > #cast the between groups data using cast(data, formula, fun.aggregate) to get the group means
  2. > cast(dataBetween, formula = ~group, fun.aggregate = mean)

The casted data with means

Note that the fun.aggregate argument can also receive a vector of summary statistics functions. This will yield all of the requested descriptives via a single cast() function.
  1. > #cast the between groups data using cast(data, formula, fun.aggregate) to get the group means, variances, and standard deviations
  2. > cast(dataBetween, formula = ~group, fun.aggregate = c(mean, var, sd))

The casted data with descriptives

Using melt() to Prepare Repeated Measures Data for Pairwise Comparisons

The melt() function can be used to morph a repeated measures ANOVA dataset prior to conducting pairwise comparisons. The melt() function receives the following primary arguments.
  • data: the dataset
  • id.vars: the id variable or a vector of values that can be used as ids
  • measure.vars: a vector containing the variables to be melted
  • variable_name: the name of the column containing the melted variables
  1. > #display the repeated measures data
  2. > dataRepeated

The raw repeated measures data
  1. > #melt the repeated measures data using melt(data, id.vars, measure.vars, variable_name) to organize it for pairwise comparisons
  2. > melt(dataRepeated, id.vars = "case", measure.vars = c("valueA", "valueB", "valueC"), variable_name = "abcValues")

The melted repeated measures data

Note that the data are now prepared to be used in the pairwise.t.test() function. See the One-Way ANOVA with Pairwise Comparisons tutorial for details on using the pairwise.t.test() function.

Complete ANOVA Reshape Example

To see a complete example of how ANOVA data can be organized using the reshape package in R, please download the ANOVA reshape example (.txt) file.

To leave a comment for the author, please follow the link and comment on his blog: R Tutorial Series.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.