A better way of saving and loading objects in R

Posted on April 1, 2012 by ucfagls in R bloggers | 0 Comments

[This article was first published on From the bottom of the heap » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Hadley Wickham (@hadleywickham) this week mentioned on Twitter his preference for saveRDS() over the more familiar save(). Being a new function to me, I thought I’d take a look…

save() and load() will be familiar to many R users. They allow you to save a named R object to a file or other connection and restore that object again. When loaded the named object is restored to the current environment (in general use this is the global environment — the workspace) with the same name it had when saved. This is annoying for example when you have a saved model object resulting from a previous fit and you want to compare it with the model object returned when the R code is rerun. Unless you change the name of the model fit object in your script you can’t have both the saved object and the newly created one available in the same environment at the same time.

Here’s an example of what I mean.

> require(mgcv)
Loading required package: mgcv
This is mgcv 1.7-13. For overview type 'help("mgcv-package")'.
> mod <- gam(Ozone ~ s(Wind), data = airquality, method = "REML")
> mod

Family: gaussian
Link function: identity

Formula:
Ozone ~ s(Wind)

Estimated degrees of freedom:
3.529  total = 4.529002

REML score: 529.4881
> save(mod, file = "mymodel.rda")
> ls()
[1] "mod"
> load(file = "mymodel.rda")
> ls()
[1] "mod"

saveRDS() provides a far better solution to this problem and to the general one of saving and loading objects created with R. saveRDS() serializes an R object into a format that can be saved. Wikipedia describes this thus …serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and “resurrected” later in the same or another computer environment. save() does the same thing, but with one important difference; saveRDS() doesn’t save the both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized.

We can illustrate this using the model fitted earlier

> ls()
[1] "mod"
> saveRDS(mod, "mymodel.rds")
> mod2 <- readRDS("mymodel.rds")
> ls()
[1] "mod"  "mod2"
> identical(mod, mod2, ignore.environment = TRUE)
[1] TRUE

(Note that the two model objects have different environments within their representations so we have to ignore this when testing their identity.)

You’ll notice that in the call to saveRDS() I named the file with the extension .rds. This appears to be the convention used for serialized object of this sort; R uses this representation often, for example package meta-data and the databases used by help.search(). In contrast the extension .rda is often used for objects serialized via save().

So there you have it; saveRDS() and readRDS() are the newest additions to my day-to-day workflow.