Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R tip: use stringsAsFactors = FALSE.

R often uses a concept of factors to re-encode strings. This can be too early and too aggressive. Sometimes a string is just a string.

Sigmund Freud, it is often claimed, said: “Sometimes a cigar is just a cigar.”

To avoid problems delay re-encoding of strings by using stringsAsFactors = FALSE when creating data.frames.

Example:

d <- data.frame(label = rep("tbd", 5))

d$label[[2]] <- "north" #> Warning in [[<-.factor(*tmp*, 2, value = structure(c(1L, NA, 1L, 1L, : #> invalid factor level, NA generated print(d) #> label #> 1 tbd #> 2 <NA> #> 3 tbd #> 4 tbd #> 5 tbd  Notice our new value was not copied in! The fix is easy: use stringsAsFactors = FALSE. d <- data.frame(label = rep("tbd", 5), stringsAsFactors = FALSE) d$label[[2]] <- "north"

print(d)
#>   label
#> 1   tbd
#> 2 north
#> 3   tbd
#> 4   tbd
#> 5   tbd


As is often the case: base R works okay in default mode and works very well if you judiciously change a few defaults. There is much less need to whole-hog replace R functionality than some claim.

Note: the above pattern of pre-building a data.frame and filling values by addressing row/column index sets is a very effective (and under appreciated) way to build up data (often easier and quicker than binding rows or columns).