Site icon R-bloggers

Creating missing values in factors

[This article was first published on R on Abhijit Dasgupta, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • Background

    I was looking at some breast cancer data recently, and was analyzing the ER (estrogen receptor) status variable. It turned out that there were three possible outcomes in the data: Positive, Negative and Indeterminate. I had imported this data as a factor, and wanted to convert the Indeterminate level to a missing value, i.e. NA.

    My usual method for numeric variables created a rather singular result:

    x <- as.factor(c('Positive','Negative','Indeterminate'))
    x1 <- ifelse(x=='Indeterminate', NA, x)
    str(x1)
    ##  int [1:3] 3 2 NA

    This process converted it to an integer!! Not the end of the world, but not ideal by any means.

    Further investigation revealed two other tidyverse strategies.

    dplyr::na_if

    This method changes the values to NA, but keeps the original level in the factor’s levels

    x2 <- dplyr::na_if(x, 'Indeterminate')
    str(x2)
    ##  Factor w/ 3 levels "Indeterminate",..: 3 2 NA
    x2
    ## [1] Positive Negative <NA>    
    ## Levels: Indeterminate Negative Positive

    dplyr::recode

    This method drops the level that I’m deeming to be missing from the factor

    x3 <- dplyr::recode(x, Indeterminate = NA_character_)
    str(x3)
    ##  Factor w/ 2 levels "Negative","Positive": 2 1 NA
    x3
    ## [1] Positive Negative <NA>    
    ## Levels: Negative Positive

    This method can also work more generally to change all values not listed to missing values.

    x4 <- dplyr::recode(x, Positive='Positive', Negative='Negative', 
                        .default=NA_character_)
    x4
    ## [1] Positive Negative <NA>    
    ## Levels: Negative Positive

    Other strategies are welcome in the comments.

    To leave a comment for the author, please follow the link and comment on their blog: R on Abhijit Dasgupta.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.