Site icon R-bloggers

The advantages of using count() to get N-way frequency tables as data frames in R

[This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

I recently introduced how to use the count() function in the “plyr” package in R to produce 1-way frequency tables in R.  Several commenters provided alternative ways of doing so, and they are all appreciated.  Today, I want to extend that tutorial by demonstrating how count() can be used to produce N-way frequency tables in the list format – this will magnify the superiority of this function over other functions like table() and xtabs().

 

2-Way Frequencies: The Cross-Tabulated Format vs. The List-Format

To get a 2-way frequency table (i.e. a frequency table of the counts of a data set as divided by 2 categorical variables), you can display it in a cross-tabulated format or in a list format.

In R, the xtabs() function is good for cross-tabulation.  Let’s use the “mtcars” data set again; recall that it is a built-in data set in Base R.

> y = xtabs(~ cyl + gear, mtcars)
> y
          gear
 cyl      3     4     5
 4        1     8     2
 6        2     4     1
 8        12    0     2

This is a nice way to visualize the counts in each of the 9 different categories as divided by the variables “gear” and “cyl”.  You can use the row and column indices of this object to extract a particular value.  For example, to extract the element in the third row and first column,

> y[3,1]
[1] 12

Alternatively, you can use the count() function in the “plyr” package to get the same frequencies in a list format.

> x = count(mtcars, c('cyl', 'gear'))
> x
         cyl     gear      freq
 1       4       3         1
 2       4       4         8
 3       4       5         2
 4       6       3         2
 5       6       4         4
 6       6       5         1
 7       8       3         12
 8       8       5         2

Notice that this object is a data frame.  The column names derive naturally from its origin.

> class(x)
 [1] "data.frame"
> names(x)
 [1] "cyl"   "gear"   "freq"

You can access any particular element by 2 methods

  1. Use the row and/or column indices.
  2. Use particular values of “cyl” and “gear”.

For example, to find the number of cars with cyl = 8 and gear = 3, you can do

> x[7, ]$freq
 [1] 12
> subset(x, cyl == 8 & gear == 3)$freq
 [1] 12

I like the second method, because I don’t have to look at the values of the output table to find which row contains that particular combination of “cyl” and “gear”.  This is a key advantage of the list format over the cross-tabulation format.

 

N-way frequencies: N > 2

Another key advantage of the list format over the cross-tabulation format is in obtaining frequency tables for 3 or more factors.

Cross-tabulations for N-way frequencies are difficult to visualize when N > 2.  If N = 3, the best that you can do is using multiple tables, one for each value of the third factor.  For example,

> w = xtabs(~ cyl + gear + vs, mtcars)
> w
 , , vs = 0
gear
 cyl    3  4  5
 4      0  0  1
 6      0  2  1
 8     12  0  2
, , vs = 1
gear
 cyl    3  4  5
 4      1  8  1
 6      2  2  0
 8      0  0  0

Moreover, it is now even more cumbersome to access the value of a particular combination of these 3 factors.

In contrast, the list format works in the same way, making it equally easy to visualize for any value of N in an N-way frequency table.

> t = count(mtcars, c('cyl', 'gear', 'vs'))
> t
        cyl    gear      vs      freq
 1      4      3         1       1
 2      4      4         1       8
 3      4      5         0       1
 4      4      5         1       1
 5      6      3         1       2
 6      6      4         0       2
 7      6      4         1       2
 8      6      5         0       1
 9      8      3         0       12
 10     8      5         0       2

Filed under: Applied Statistics, Categorical Data Analysis, Data Analysis, Descriptive Statistics, R programming, Statistics, Tutorials Tagged: count, cross-tabulation, data analysis, frequency table, R, R programming, statistics, table(), xtabs()

To leave a comment for the author, please follow the link and comment on their blog: The Chemical Statistician » R programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.