The advantages of using count() to get N-way frequency tables as data frames in R

(This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers)

Introduction

I recently introduced how to use the count() function in the “plyr” package in R to produce 1-way frequency tables in R.  Several commenters provided alternative ways of doing so, and they are all appreciated.  Today, I want to extend that tutorial by demonstrating how count() can be used to produce N-way frequency tables in the list format – this will magnify the superiority of this function over other functions like table() and xtabs().

 

2-Way Frequencies: The Cross-Tabulated Format vs. The List-Format

To get a 2-way frequency table (i.e. a frequency table of the counts of a data set as divided by 2 categorical variables), you can display it in a cross-tabulated format or in a list format.

In R, the xtabs() function is good for cross-tabulation.  Let’s use the “mtcars” data set again; recall that it is a built-in data set in Base R.

> y = xtabs(~ cyl + gear, mtcars)
> y
          gear
 cyl      3     4     5
 4        1     8     2
 6        2     4     1
 8        12    0     2

This is a nice way to visualize the counts in each of the 9 different categories as divided by the variables “gear” and “cyl”.  You can use the row and column indices of this object to extract a particular value.  For example, to extract the element in the third row and first column,

> y[3,1]
[1] 12

Alternatively, you can use the count() function in the “plyr” package to get the same frequencies in a list format.

> x = count(mtcars, c('cyl', 'gear'))
> x
         cyl     gear      freq
 1       4       3         1
 2       4       4         8
 3       4       5         2
 4       6       3         2
 5       6       4         4
 6       6       5         1
 7       8       3         12
 8       8       5         2

Notice that this object is a data frame.  The column names derive naturally from its origin.

> class(x)
 [1] "data.frame"
> names(x)
 [1] "cyl"   "gear"   "freq"

You can access any particular element by 2 methods

  1. Use the row and/or column indices.
  2. Use particular values of “cyl” and “gear”.

For example, to find the number of cars with cyl = 8 and gear = 3, you can do

> x[7, ]$freq
 [1] 12
> subset(x, cyl == 8 & gear == 3)$freq
 [1] 12

I like the second method, because I don’t have to look at the values of the output table to find which row contains that particular combination of “cyl” and “gear”.  This is a key advantage of the list format over the cross-tabulation format.

 

N-way frequencies: N > 2

Another key advantage of the list format over the cross-tabulation format is in obtaining frequency tables for 3 or more factors.

Cross-tabulations for N-way frequencies are difficult to visualize when N > 2.  If N = 3, the best that you can do is using multiple tables, one for each value of the third factor.  For example,

> w = xtabs(~ cyl + gear + vs, mtcars)
> w
 , , vs = 0
gear
 cyl    3  4  5
 4      0  0  1
 6      0  2  1
 8     12  0  2
, , vs = 1
gear
 cyl    3  4  5
 4      1  8  1
 6      2  2  0
 8      0  0  0

Moreover, it is now even more cumbersome to access the value of a particular combination of these 3 factors.

In contrast, the list format works in the same way, making it equally easy to visualize for any value of N in an N-way frequency table.

> t = count(mtcars, c('cyl', 'gear', 'vs'))
> t
        cyl    gear      vs      freq
 1      4      3         1       1
 2      4      4         1       8
 3      4      5         0       1
 4      4      5         1       1
 5      6      3         1       2
 6      6      4         0       2
 7      6      4         1       2
 8      6      5         0       1
 9      8      3         0       12
 10     8      5         0       2

Filed under: Applied Statistics, Categorical Data Analysis, Data Analysis, Descriptive Statistics, R programming, Statistics, Tutorials Tagged: count, cross-tabulation, data analysis, frequency table, R, R programming, statistics, table(), xtabs()

To leave a comment for the author, please follow the link and comment on their blog: The Chemical Statistician » R programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)