Adding metadata to variables
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There are only really two ways to preserve your statistical analyses. You either save the variables that you create, or you save the code that you used to create them. In general the latter is much preferred because at some point you’ll realise that your model was wrong, or your dataset has changed, and you need to re-run your analysis. If you only stored your variables then you are now stuck rewriting your code in order to create new versions, which is really not fun. On the other hand, if you saved your code, all your have to do is tweak it and run it.
Occasionally though, just keeping the code and rerunning an analysis isn’t practical. The most obvious case being when it takes a long time. If your model takes more than ten minutes to run, it can be really useful to save its variables as well as the source code.
The problem with saving variables is that when you come back and load them six months later, it isn’t always obvious what they are or where they came from. With code, we solve this by using comments to jog our memory, so it would be nice to have an equivalent for variables. In fact, in R, such a facility exists with the – you guessed it – comment
function.
library(lattice) comment(barley) <- "Immer's barley data, 1934. The data from the Morris site may have the wrong years." comment(barley)
The comment
function simply stores the string as an attribute of the variable, with some special rules on printing. Other common attributes that you may be familiar with are names
for vectors and lists, and dim
and dimnames
for matrices.
You can find the names of all the attributes of a variable with the attributes
function, and get and set individual attributes with attr
.
x <- c(apple = 1, banana = 2) attr(x, "type") <- "fruit" attributes(x) attr(x, "names") #same as names(x)
Attributes are really great for storing contextual metadata about a variable. For starters, when you come back to your saved workspace after those six months you might want to know who created the variable and when. To get this facility, we need an enhanced version of assign
.
get_user <- function() { env <- if(.Platform$OS.type == "windows") "USERNAME" else "USER" unname(Sys.getenv(env)) } assign_with_metadata <- function(x, value, ..., pos = parent.frame(), inherits = FALSE) { attr(value, "creator") <- get_user() attr(value, "time_created") <- Sys.time() more_attr <- list(...) attr_names <- names(more_attr) for(i in seq_along(more_attr)) { attr(value, attr_names[i]) <- more_attr[[i]] } assign(x, value, pos = pos, inherits = inherits) } assign_with_metadata("x", 1:3, monkey = "chimp")
Notice the ...
that allows you to add arbitrary attributes to the variable.
While this is great, and solves the problem, typing assign_with_metadata
is way too clunky. It would be much easier if we could just use <-
to assign variables and get the metadata for free.
Actually, overriding <-
itself is going to lead to slowness and likely errors. Since we don’t want to store metadata for every variable (just the important ones), it is better to define our own operators to do so.
`%<-%` <- function(x, value) { xname <- deparse(substitute(x)) pos <- parent.frame() assign_with_metadata(xname, value, pos = pos) } `%<<-%` <- function(x, value) { xname <- deparse(substitute(x)) pos <- globalenv() assign_with_metadata(xname, value, pos = pos) } m %<-% "foo" #local assignment with metadata f <- function() { n %<<-% "bar" #global assignment with metadata } f()
With these functions, if you want to save your variables for later, simply swap <-
for %<-%
.
Tagged: assignment, metadata, r
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.