Units and metadata

August 2, 2012
By

(This article was first published on StaTEAstics., and kindly contributed to R-bloggers)

Handling meta-data is not natural in R, or any traditional rectangular shaped type data storage system.

There are several tricks and packages which attempt to solve this problem, with Hmisc using the atrribute feature and the IRange package having its own DataFrame class.

The Hmisc allows one to store meta data such as units, label and comments

library(Hmisc)

## Create a test data frame
test.df <- data.frame(x = ts(1:12, start = c(2000, 1), frequency = 12),
                      y = ts(1:12, start = c(2001, 1), frequency = 12))

## Assign the units and comment
units(test.df$x) = "cm"
units(test.df$y) = "m"
comment(test.df) <- "this is a test data set"

## Summary of the data
describe(test.df)
contents(test.df)

The disadvantage of this approach is that the meta data is lost when functions such as subset is used.

str(subset(test.df, select = a, drop = FALSE))

This render the use only restrict to storage but not manipulation.

The second approach of the IRange package creates a whole new S4 class for handling data with meta-data, with corresponding accessor functions the attributes can be retained.



library(IRanges)
test2.df <- DataFrame(x = 1:10, y = letters[1:10])
metadata(test2.df) <- list(units=list(a = "cm", b="m"))



str(subset(test2.df, select = x))


In this case the units are still preserved, nevertheless the subset function does not subset the meta-data which can cause problem.

In short, there are definitely rooms for improvement. Writing a new class is definitely more natural and gives the developer and user more control.


To leave a comment for the author, please follow the link and comment on his blog: StaTEAstics..

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.