Site icon R-bloggers

Editing/Adding factor levels in R

[This article was first published on We think therefore we R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< !DOCTYPE html>< !-- saved from url=(0014)about:internet --> < !-- Styles for R syntax highlighter --> < !-- R syntax highlighter -->

I was trying to change few levels in my factor variable by simply coercing characters on that factor variable but it dint seem to work.

data(iris)
iris$Species[c language="(50:120)"][/c] <- rep("Random", 71)

## Warning: invalid factor level, NAs generated

iris$Species

##   [1] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##   [8] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [15] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [22] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [29] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [36] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [43] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [50] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     

##  [57] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     
##  [64] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     

##  [71] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     
##  [78] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     

##  [85] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     
##  [92] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     

##  [99] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     
## [106] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     

## [113] <NA>      <NA>      <NA>      <NA>      <NA>      <NA>      <NA>     
## [120] <NA>      virginica virginica virginica virginica virginica virginica
## [127] virginica virginica virginica virginica virginica virginica virginica
## [134] virginica virginica virginica virginica virginica virginica virginica
## [141] virginica virginica virginica virginica virginica virginica virginica
## [148] virginica virginica virginica
## Levels: setosa versicolor virginica

Well, I did find a way to find a work around for that by doing this:

iris$Species <- as.character(iris$Species)
iris$Species[c language="(50:120)"][/c] <- rep("Random", 71)
iris$Species <- as.factor(iris$Species)
iris$Species

##   [1] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##   [8] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [15] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [22] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [29] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [36] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [43] setosa    setosa    setosa    setosa    setosa    setosa    setosa   
##  [50] Random    Random    Random    Random    Random    Random    Random   
##  [57] Random    Random    Random    Random    Random    Random    Random   
##  [64] Random    Random    Random    Random    Random    Random    Random   
##  [71] Random    Random    Random    Random    Random    Random    Random   
##  [78] Random    Random    Random    Random    Random    Random    Random   
##  [85] Random    Random    Random    Random    Random    Random    Random   
##  [92] Random    Random    Random    Random    Random    Random    Random   
##  [99] Random    Random    Random    Random    Random    Random    Random   
## [106] Random    Random    Random    Random    Random    Random    Random   
## [113] Random    Random    Random    Random    Random    Random    Random   
## [120] Random    virginica virginica virginica virginica virginica virginica
## [127] virginica virginica virginica virginica virginica virginica virginica
## [134] virginica virginica virginica virginica virginica virginica virginica
## [141] virginica virginica virginica virginica virginica virginica virginica
## [148] virginica virginica virginica
## Levels: Random setosa virginica

This problem annoyed me at first, “Why would R not allow me to change/add factor levels!?!@#!@#?” but then Utkarsh and I had a conversation about this which made me think otherwise.

Excerpts from the conversation:

Utkarsh: It is usually not good to create data on the fly. Besides, when you create a factor variable, you should give the finite set of values it can take. This prevents future mistakes. It is called type checking. Python does not do it. R does it to some extent. C does it to some extent. Haskell does it very very strictly and it prevents about 50% of bugs from appearing. Let's say you misspell one of the levels.

In retrospect, it actually makes sense for us not to be able to add/edit the levels in factor variables. For a simple reason, we “might” make mistake, and misspelling a factor level could cause serious trouble. Lesson learnt!

To leave a comment for the author, please follow the link and comment on their blog: We think therefore we R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.