(This article was first published on factbased, and kindly contributed to R-bloggers)
The workflow for statistical analyses is discussed at several places. Often, it is recommended:
- never change the raw data, but transform it,
- keep your analysis reproducible,
- separate functions and data,
- use R package system as organizing structure.
In some recent projects I tried an S4 class approach for this workflow, which I want to present and discuss. It makes use of the package datamart, which I recently submitted to CRAN. Here is a sample session:
> library(datamart)
> library(beeswarm)
> # load one of my datasets
> xp <- expenditures()
> # introspection: what
> # "resources" for this
> # dataset did I once define?
> queries(xp)
Evs#Categories Evs#Elasticities Evs#Elasticity
"Categories" "Elasticities" "Elasticity"
InternalData#Raw
"Raw"
> # get me a resource
> head(query(xp, "Raw"))
coicop2 coicop2de
1 15 Expenditures (exclusive private consumption)
2 15 Expenditures (exclusive private consumption)
3 15 Expenditures (exclusive private consumption)
4 15 Expenditures (exclusive private consumption)
5 15 Expenditures (exclusive private consumption)
6 15 Expenditures (exclusive private consumption)
income hhtype value
1 (all) (all) 2539
2 (all) Single 1462
3 (all) Single woman 1232
4 (all) Single man 1866
5 (all) Single parent 1004
6 (all) Single parent, 1 kid 991Read on to see how a S4 dataset object is defined and accessed, and what I see in favour and against this approach.
Read more »To leave a comment for the author, please follow the link and comment on his blog: factbased.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).