R Tip: Use stringsAsFactors = FALSE

March 17, 2018
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

R tip: use stringsAsFactors = FALSE.

R often uses a concept of factors to re-encode strings. This can be too early and too aggressive. Sometimes a string is just a string.


800px Sigmund Freud by Max Halberstadt cropped

Sigmund Freud, it is often claimed, said: “Sometimes a cigar is just a cigar.”

To avoid problems delay re-encoding of strings by using stringsAsFactors = FALSE when creating data.frames.

Example:

d <- data.frame(label = rep("tbd", 5))

d$label[[2]] <- "north"
#> Warning in `[[<-.factor`(`*tmp*`, 2, value = structure(c(1L, NA, 1L, 1L, :
#> invalid factor level, NA generated

print(d)
#>   label
#> 1   tbd
#> 2  
#> 3   tbd
#> 4   tbd
#> 5   tbd

Notice our new value was not copied in!

The fix is easy: use stringsAsFactors = FALSE.

d <- data.frame(label = rep("tbd", 5),
                stringsAsFactors = FALSE)

d$label[[2]] <- "north"

print(d)
#>   label
#> 1   tbd
#> 2 north
#> 3   tbd
#> 4   tbd
#> 5   tbd

As is often the case: base R works okay in default mode and works very well if you judiciously change a few defaults. There is much less need to whole-hog replace R functionality than some claim.

Note: the above pattern of pre-building a data.frame and filling values by addressing row/column index sets is a very effective (and under appreciated) way to build up data (often easier and quicker than binding rows or columns).

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)