Reproducible research and a repository of artifacts, a RFC

August 20, 2018

(This article was first published on R – Random Remarks, and kindly contributed to R-bloggers)

This  work is still in progres. I think, however, it can already resonate with some people in the community. The communication I am hopeful for should lead to a better design and maybe getting valuable tools faster.

The main goal is to extend the base R’s history mechanism (see ?history) which currently gives access to past commands run in R. What if, however, we could browse not only the commands but also the objects (artifacts)? Hence, the repository of artifacts.

It is implemented by a number of packages. The two most important are: the repository which provides the basic logic of storing, processing and retrieving artifacts; and the ui which implements a basic, text-only user interface and hooks callbacks into R. The other packages are: storage, defer and utilities.

Here are the basic rules of how repository of artifacts works: the state of R session after each command is examined and all R objects and plots are recorded, together with the information about their origin (parent objects). Thus, the complete graph of origin of each artifact can be retrieved from the repository: the complete sequence of R commands and their byproduct artifacts. Further explanation can be found in the current motivation and plan for future work and examples of working with the repository are presented in this tutorial.

Questions I hope to explore with those interested are:

  • since we all have different working styles, is this design a good fit for anyone besides me?
  • how well does it work in actual data analysis? are the tools available already (on CRAN, GitHub, etc.) sufficient? if not, what are the gaps that need to be addressed?
  •  would anyone be willing to share the recordings of their historical R sessions? even better, track some in the repository? (assumed that it does not disrupt your work)
  • what are the gaps of the current design? how can it be improved or extended to make more sense?

To leave a comment for the author, please follow the link and comment on their blog: R – Random Remarks. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)