Use unique() instead of levels() to find the possible values of a character variable in R

(This article was first published on R programming – The Chemical Statistician, and kindly contributed to R-bloggers)

When I first encountered R, I learned to use the levels() function to find the possible values of a categorical variable.  However, I recently noticed something very strange about this function.

Consider the built-in data set “iris” and its character variable “Species”.  Here are the possible values of “Species”, as shown by the levels() function.

> levels(iris$Species)

[1] "setosa" "versicolor" "virginica"

Now, let’s remove all rows containing “setosa”.  I will use the table() function to confirm that no rows contain “setosa”, and then I will apply the levels() function to “Species” again.

> iris2 = subset(iris, Species != 'setosa')
> table(iris2$Species)

    setosa versicolor virginica 
         0         50        50 

> levels(iris2$Species)

[1] "setosa" "versicolor" "virginica"

The new data set “iris2” does not have any rows containing “setosa” as a possible value of “Species”, yet the levels() function still shows “setosa” in its output.

According to the user G5W in Stack Overflow, this is a desirable behaviour for the levels() function.  Here is my interpretation of the intent behind the creators of base R: The possible values of a character variable are fundamental attributes of that variable, which should not be altered because of changes in the data.

Obviously, this can cause a lot of confusion and produce wrong information, so here is my solution: From now on, I will use the unique() function to find the possible values of a character variable.  Here is the result.

> unique(iris2$Species)

[1] versicolor virginica 
Levels: setosa versicolor virginica


This is the output that I expect; “setosa” does not appear in the resulting vector.  However, unique() stills hows the original levels, which include “setosa” – that’s a nice feature.


I thank my colleagues Layne Newhouse, Jack Davis, and Dmity Shopin for their valuable discussion about this on LinkedIn.

To leave a comment for the author, please follow the link and comment on their blog: R programming – The Chemical Statistician. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)