Easy cell statistics for factorial designs

December 2, 2011
By

(This article was first published on statMethods blog, and kindly contributed to R-bloggers)

A common task when analyzing multi-group designs is obtaining descriptive statistics for various cells and cell combinations.

There are many functions that can help you accomplish this, including aggregate() and by() in the base installation, summaryBy() in the doBy package, and describe.by() in the psych package. However, I find it easiest to use the melt() and cast() functions in the reshape package.

As an example, consider the mtcars dataframe (included in the base installation) containing road test information on automobiles assessed in 1974. Suppose that you want to obtain the means, standard deviations, and sample sizes for the variables miles per gallon (mpg), horsepower (hp), and weight (wt). You want these statistics for all cars in the dataset, separately by transmission type (am) and number of gears (gear), and for the cells formed by crossing these two variables.

You can accomplish this with the following code:

options(digits = 3)
library(reshape)

# define and name the statistics of interest
stats <- function(x)(c(N = length(x), Mean = mean(x), SD = sd(x)))

# label the levels of the classification variables (optional)
mtcars$am   <- factor(mtcars$am, levels = c(0, 1), labels = c("Automatic", "Manual"))
mtcars$gear <- factor(mtcars$gear, levels = c(3, 4, 5),
                      labels = c("3-Cyl", "4-Cyl", "5-Cyl"))

# melt the dataset
dfm   <- melt(mtcars,
              # outcome variables
              measure.vars = c("mpg", "hp", "wt"),
              # classification variables
              id.vars = c("am", "gear"))

# statistics for the entire sample
cast(dfm, variable ~ ., stats)

# statistics for cells defined by transmission type
cast(dfm, am + variable ~ ., stats)

# statistics for cells defined by number of gears
cast(dfm, gear + variable ~ ., stats)

# statistics for cells defined by each am x gear combination
cast(dfm, am + gear + variable ~ ., stats)

The output is given below:

  variable  N   Mean     SD
1      mpg 32  20.09  6.027
2       hp 32 146.69 68.563
3       wt 32   3.22  0.978

         am variable  N   Mean     SD
1 Automatic      mpg 19  17.15  3.834
2 Automatic       hp 19 160.26 53.908
3 Automatic       wt 19   3.77  0.777
4    Manual      mpg 13  24.39  6.167
5    Manual       hp 13 126.85 84.062
6    Manual       wt 13   2.41  0.617

  gear  variable  N   Mean      SD

1 3-Cyl      mpg 15  16.11   3.372
2 3-Cyl       hp 15 176.13  47.689
3 3-Cyl       wt 15   3.89   0.833
4 4-Cyl      mpg 12  24.53   5.277
5 4-Cyl       hp 12  89.50  25.893
6 4-Cyl       wt 12   2.62   0.633
7 5-Cyl      mpg  5  21.38   6.659
8 5-Cyl       hp  5 195.60 102.834
9 5-Cyl       wt  5   2.63   0.819

          am  gear variable  N   Mean      SD
1  Automatic 3-Cyl      mpg 15  16.11   3.372
2  Automatic 3-Cyl       hp 15 176.13  47.689
3  Automatic 3-Cyl       wt 15   3.89   0.833
4  Automatic 4-Cyl      mpg  4  21.05   3.070
5  Automatic 4-Cyl       hp  4 100.75  29.010
6  Automatic 4-Cyl       wt  4   3.30   0.157
7     Manual 4-Cyl      mpg  8  26.27   5.414
8     Manual 4-Cyl       hp  8  83.88  24.175
9     Manual 4-Cyl       wt  8   2.27   0.461
10    Manual 5-Cyl      mpg  5  21.38   6.659
11    Manual 5-Cyl       hp  5 195.60 102.834
12    Manual 5-Cyl       wt  5   2.63   0.819

The approach is easily generalized to any number of grouping variables (factors), dependent/outcome variables, and statistics, and gives you a powerful tool for slicing and dicing data.

To leave a comment for the author, please follow the link and comment on their blog: statMethods blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)