Forgotten features of R 4.0.0
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R version 4.0.0 was released almost two years ago. The change in the major version, 3.x.y to 4.0.0, represented significant and potentially breaking changes. For an organisation to start using these new features, everyone in the company must have access to that version; otherwise code isn’t shareable. This naturally slows down adoption.
We moved our internal R projects to depend on version R 4.0.0 around twelve months ago – a few months after the release date. Over the last year we’ve also assisted a number of clients in making the move; typically with Shiny applications. This post aims to highlight some of the features we’ve found useful and also some of the potential pitfalls.
Do you use RStudio Pro? If so, checkout out our managed RStudio services
StringsAsFactors
From the beginning, R converted imported strings to factors.
For most users, this typically occurred when reading in data using read.csv()
.
This default made sense for statistical modelling, but was a little
tricky for new users.
Especially as today’s data sets tend to have messy string data.
In R 4.0.0, this default changed, with stringsAsFactors now being FALSE
by default.
For our internal applications, this didn’t really cause issues, but we’ve
had to help a number of clients “upgrade” their Shiny app to run using R version 4.0.0.
If you are planning on making this move, here’s our standard “gotcha” check-list:
- Are there any calls to
read.csv()
,read.table()
orread.delim()
? If so, this could cause issues. You can either setstringsAsFactors = TRUE
in these functions, or fix any issues that crop up. - Are there any data frames saved as
rds
files? If so, check the columns for factors. - Do you use
data.frame()
to create data frames? If so, factors might creep in. - Do packages return data frames that you use? This is the trickiest bug to track down.
Raw Character Strings
Using the syntax r"(some characters)"
we can now define literal strings.
This avoids the painful adding of backslashes when escaping special characters.
We’ve recently started using this regularly when generating PDF documents
that have LaTeX in them. For example,
r"(Avoiding \texttt{backslash} and "speech mark" hell.)" #> [1] "Avoiding \\texttt{backslash} and \"speech mark\" hell."
Other uses are regular expressions and HTML code.
Caching with R_user_dir()
Buried deep within the changelog was a reference to R_user_dir()
from the {tools} package.
This function provides a nice, cross-platform method for creating
directories that can be used to store R-related user-specific data, configuration and cache files.
For example,
tools::R_user_dir("my_pkg", which = "cache") #> [1] "/home/ncsg3/.cache/R/my_pkg"
provides a string that can be used to create a directory. In the {oysteR} package, I use this idea to cache API results. As R generates the path, I don’t have to worry about which OS the user is on.
Also of note
I’ve not used the new reference counting directly, but by switching to R 4.0.0 I’ve certainly benefited from a slightly faster, less resource-hungry version of R. Likewise, the {grid} package was improved, so {ggplot2} is also a little quicker. This is one of the benefits of upgrading R versions; things just get a bit nicer.
References
- R Changelog
- A nice overview of R 4.0.0 by David Smith.
For updates and revisions to this article, see the original post
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.