[This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

#### Introduction

I recently introduced how to use the count() function in the “plyr” package in R to produce 1-way frequency tables in R.  Several commenters provided alternative ways of doing so, and they are all appreciated.  Today, I want to extend that tutorial by demonstrating how count() can be used to produce N-way frequency tables in the list format – this will magnify the superiority of this function over other functions like table() and xtabs().

#### 2-Way Frequencies: The Cross-Tabulated Format vs. The List-Format

To get a 2-way frequency table (i.e. a frequency table of the counts of a data set as divided by 2 categorical variables), you can display it in a cross-tabulated format or in a list format.

In R, the xtabs() function is good for cross-tabulation.  Let’s use the “mtcars” data set again; recall that it is a built-in data set in Base R.

> y = xtabs(~ cyl + gear, mtcars)
> y
gear
cyl      3     4     5
4        1     8     2
6        2     4     1
8        12    0     2

This is a nice way to visualize the counts in each of the 9 different categories as divided by the variables “gear” and “cyl”.  You can use the row and column indices of this object to extract a particular value.  For example, to extract the element in the third row and first column,

> y[3,1]
[1] 12

Alternatively, you can use the count() function in the “plyr” package to get the same frequencies in a list format.

> x = count(mtcars, c('cyl', 'gear'))
> x
cyl     gear      freq
1       4       3         1
2       4       4         8
3       4       5         2
4       6       3         2
5       6       4         4
6       6       5         1
7       8       3         12
8       8       5         2

Notice that this object is a data frame.  The column names derive naturally from its origin.

> class(x)
[1] "data.frame"
> names(x)
[1] "cyl"   "gear"   "freq"

You can access any particular element by 2 methods

1. Use the row and/or column indices.
2. Use particular values of “cyl” and “gear”.

For example, to find the number of cars with cyl = 8 and gear = 3, you can do

> x[7, ]$freq [1] 12 > subset(x, cyl == 8 & gear == 3)$freq
[1] 12

I like the second method, because I don’t have to look at the values of the output table to find which row contains that particular combination of “cyl” and “gear”.  This is a key advantage of the list format over the cross-tabulation format.

#### N-way frequencies: N > 2

Another key advantage of the list format over the cross-tabulation format is in obtaining frequency tables for 3 or more factors.

Cross-tabulations for N-way frequencies are difficult to visualize when N > 2.  If N = 3, the best that you can do is using multiple tables, one for each value of the third factor.  For example,

> w = xtabs(~ cyl + gear + vs, mtcars)
> w
, , vs = 0
gear
cyl    3  4  5
4      0  0  1
6      0  2  1
8     12  0  2
, , vs = 1
gear
cyl    3  4  5
4      1  8  1
6      2  2  0
8      0  0  0

Moreover, it is now even more cumbersome to access the value of a particular combination of these 3 factors.

In contrast, the list format works in the same way, making it equally easy to visualize for any value of N in an N-way frequency table.

> t = count(mtcars, c('cyl', 'gear', 'vs'))
> t
cyl    gear      vs      freq
1      4      3         1       1
2      4      4         1       8
3      4      5         0       1
4      4      5         1       1
5      6      3         1       2
6      6      4         0       2
7      6      4         1       2
8      6      5         0       1
9      8      3         0       12
10     8      5         0       2

Filed under: Applied Statistics, Categorical Data Analysis, Data Analysis, Descriptive Statistics, R programming, Statistics, Tutorials Tagged: count, cross-tabulation, data analysis, frequency table, R, R programming, statistics, table(), xtabs()