**The Chemical Statistician » R programming**, and kindly contributed to R-bloggers)

#### Introduction

I recently introduced how to use the count() function in the “plyr” package in R to produce 1-way frequency tables in R. Several commenters provided alternative ways of doing so, and they are all appreciated. Today, I want to extend that tutorial by demonstrating how count() can be used to produce N-way frequency tables in the list format – this will magnify the superiority of this function over other functions like table() and xtabs().

#### 2-Way Frequencies: The Cross-Tabulated Format vs. The List-Format

To get a 2-way frequency table (i.e. a frequency table of the counts of a data set as divided by 2 categorical variables), you can display it in a **cross-tabulated format** or in a **list format**.

In R, the xtabs() function is good for cross-tabulation. Let’s use the “mtcars” data set again; recall that it is a built-in data set in Base R.

> y = xtabs(~ cyl + gear, mtcars) > y gear cyl 3 4 5 4 1 8 2 6 2 4 1 8 12 0 2

This is a nice way to visualize the counts in each of the 9 different categories as divided by the variables “gear” and “cyl”. You can use the row and column indices of this object to extract a particular value. For example, to extract the element in the third row and first column,

> y[3,1] [1] 12

Alternatively, you can use the count() function in the “plyr” package to get the same frequencies in a list format.

> x = count(mtcars, c('cyl', 'gear')) > x cyl gear freq 1 4 3 1 2 4 4 8 3 4 5 2 4 6 3 2 5 6 4 4 6 6 5 1 7 8 3 12 8 8 5 2

Notice that this object is a data frame. The column names derive naturally from its origin.

> class(x) [1] "data.frame"

> names(x) [1] "cyl" "gear" "freq"

You can access any particular element by 2 methods

- Use the row and/or column indices.
- Use particular values of “cyl” and “gear”.

For example, to find the number of cars with cyl = 8 and gear = 3, you can do

> x[7, ]$freq [1] 12 > subset(x, cyl == 8 & gear == 3)$freq [1] 12

I like the second method, because I don’t have to look at the values of the output table to find which row contains that particular combination of “cyl” and “gear”. **This is a key advantage of the list format over the cross-tabulation format.**

#### N-way frequencies: N > 2

**Another key advantage of the list format over the cross-tabulation format is in obtaining frequency tables for 3 or more factors.**

Cross-tabulations for N-way frequencies are difficult to visualize when N > 2. If N = 3, the best that you can do is using multiple tables, one for each value of the third factor. For example,

> w = xtabs(~ cyl + gear + vs, mtcars) > w , , vs = 0 gear cyl 3 4 5 4 0 0 1 6 0 2 1 8 12 0 2 , , vs = 1 gear cyl 3 4 5 4 1 8 1 6 2 2 0 8 0 0 0

Moreover, it is now even more cumbersome to access the value of a particular combination of these 3 factors.

In contrast, the list format works in the same way, **making it equally easy to visualize for any value of N in an N-way frequency table.
**

> t = count(mtcars, c('cyl', 'gear', 'vs')) > t cyl gear vs freq 1 4 3 1 1 2 4 4 1 8 3 4 5 0 1 4 4 5 1 1 5 6 3 1 2 6 6 4 0 2 7 6 4 1 2 8 6 5 0 1 9 8 3 0 12 10 8 5 0 2

Filed under: Applied Statistics, Categorical Data Analysis, Data Analysis, Descriptive Statistics, R programming, Statistics, Tutorials Tagged: count, cross-tabulation, data analysis, frequency table, R, R programming, statistics, table(), xtabs()

**leave a comment**for the author, please follow the link and comment on their blog:

**The Chemical Statistician » R programming**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...