stringsAsFactors = FALSE

[This article was first published on R – kata helion, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R is changing the way it deals with converting strings to factors in functions like data.frame(). There is a detailed post about the plan, but that post was created before version 4.0.0 so I’m not sure if anything has changed.

I’m running R 4.0.5 right now. I know I’m behind, but I’m in the middle of a project and I don’t want to update until I finish the project. Anyway, default.stringsAsFactors() is now TRUE. And this is nice. I can also see:

args(data.frame)

# function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, fix.empty.names = TRUE, stringsAsFactors = default.stringsAsFactors())

That’s nice. but look at:

args(expand.grid)

# function (..., KEEP.OUT.ATTRS = TRUE, stringsAsFactors = TRUE)

That’s less nice and generated a rather confusing bug for me recently. Also look at this:

letterframe1 <- data.frame(cbind(LETTERS, 1))
class(letterframe1[, 1])
# character

letterframe2 <- data.frame(table(LETTERS))
class(letterframe2[, 1])
# factor

letterframe3 <- data.frame(table(LETTERS), stringsAsFactors = FALSE)
class(letterframe3[, 1])
# factor

letterframe4 <- as.data.frame(table(LETTERS), stringsAsFactors = FALSE)
class(letterframe4[, 1])
# character

And for the record:

letterframe5 <- as.data.frame(table(LETTERS))
class(letterframe5[, 1])
# factor

By the way, I ran all the above examples in R 3.6.1 and everything returned factor except for class(letterframe4[, 1]).

This was unexpected. I’m sure there’s a sensible reason for all that, but I don’t know enough to guess exactly what it could be.

The character input must be getting converted to a factor somewhere inside table(), but I’m not sure why the difference between as.data.frame() and data.frame(). If it is truly already a factor then both functions should return a factor regardless of stringsAsFactors. Although to me that isn’t really an optimal solution either because it isn’t obvious from table() output or from the documentation that this conversion should take place to my initial character input.

If I correctly understood the dev blog linked above, changing default.stringsAsFactors() might be a transition phase to a new system that works in a different way, so maybe this new system will encompass some of these other scenarios when implemented.

To leave a comment for the author, please follow the link and comment on their blog: R – kata helion.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)