How to Get the Frequency Table of a Categorical Variable as a Data Frame in R

(This article was first published on The Chemical Statistician » R programming, and kindly contributed to R-bloggers)

Introduction

One feature that I like about R is the ability to access and manipulate the outputs of many functions.  For example, you can extract the kernel density estimates from density() and scale them to ensure that the resulting density integrates to 1 over its support set.

I recently needed to get a frequency table of a categorical variable in R, and I wanted the output as a data table that I can access and manipulate.  This is a fairly simple and common task in statistics and data analysis, so I thought that there must be a function in Base R that can easily generate this.  Sadly, I could not find such a function.  In this post, I will explain why the seemingly obvious table() function does not work, and I will demonstrate how the count() function in the ‘plyr’ package can achieve this goal.

 

The Example Data Set – mtcars

Let’s use the mtcars data set that is built into R as an example.  The categorical variable that I want to explore is “gear” – this denotes the number of forward gears in the car – so let’ s view the first 6 observations of just the car model and the gear.  We can use the subset() function to restrict the data set to show just the row names and “gear”.

> head(subset(mtcars, select = 'gear'))
                     gear
Mazda RX4            4
Mazda RX4 Wag        4
Datsun 710           4
Hornet 4 Drive       3
Hornet Sportabout    3
Valiant              3

What are the possible values of “gear”?  Let’s use the factor() function to find out.

> factor(mtcars$gear)
 [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4
Levels: 3 4 5

The cars in this data set have either 3, 4 or 5 forward gears.  How many cars are there for each number of forward gears?

 

Why the table() function does not work well

The table() function in Base R does give the counts of a categorical variable, but the output is not a data frame – it’s a table, and it’s not easily accessible like a data frame.

> w = table(mtcars$gear)
> w
3 4 5 
15 12 5 

> class(w)
[1] "table"

You can convert this to a data frame, but the result does not retain the variable name “gear” in the corresponding column name.

> t = as.data.frame(w)
> t
    Var1 Freq
1   3    15
2   4    12
3   5    5

You can correct this problem with the names() function.

> names(t)[1] = 'gear'
> t
    gear Freq
1   3    15
2   4    12
3   5    5

I finally have what I want, but that took several functions to accomplish.  Is there an easier way?

 

count() to the Rescue!  (With Complements to the “plyr” Package)

Thankfully, there is an easier way – it’s the count() function in the “plyr” package.  If you don’t already have the “plyr” package, install it first – run the command

 install.packages('plyr')

Then, call its library, and the count() function will be ready for use.

> library(plyr)
> count(mtcars, 'gear')
       gear      freq
1      3         15
2      4         12
3      5         5
> y = count(mtcars, 'gear')
> y
       gear      freq
1      3         15
2      4         12
3      5         5
> class(y)
[1] "data.frame"

As the class() function confirms, this output is indeed a data frame!

Filed under: Applied Statistics, Categorical Data Analysis, Data Analysis, Descriptive Statistics, R programming, Statistics, Tutorials Tagged: categorical variable, class(), count, data frame, factor, frequency table, install.packages(0, mtcars, names(), plyr, R, R programming, subset, table()

To leave a comment for the author, please follow the link and comment on their blog: The Chemical Statistician » R programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)