Reproducible research and a repository of artifacts, a RFC

[This article was first published on R – Random Remarks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This  work is still in progres. I think, however, it can already resonate with some people in the community. The communication I am hopeful for should lead to a better design and maybe getting valuable tools faster.

The main goal is to extend the base R’s history mechanism (see ?history) which currently gives access to past commands run in R. What if, however, we could browse not only the commands but also the objects (artifacts)? Hence, the repository of artifacts.

It is implemented by a number of packages. The two most important are: the repository which provides the basic logic of storing, processing and retrieving artifacts; and the ui which implements a basic, text-only user interface and hooks callbacks into R. The other packages are: storage, defer and utilities.

Here are the basic rules of how repository of artifacts works: the state of R session after each command is examined and all R objects and plots are recorded, together with the information about their origin (parent objects). Thus, the complete graph of origin of each artifact can be retrieved from the repository: the complete sequence of R commands and their byproduct artifacts. Further explanation can be found in the current motivation and plan for future work and examples of working with the repository are presented in this tutorial.

Questions I hope to explore with those interested are:

  • since we all have different working styles, is this design a good fit for anyone besides me?
  • how well does it work in actual data analysis? are the tools available already (on CRAN, GitHub, etc.) sufficient? if not, what are the gaps that need to be addressed?
  •  would anyone be willing to share the recordings of their historical R sessions? even better, track some in the repository? (assumed that it does not disrupt your work)
  • what are the gaps of the current design? how can it be improved or extended to make more sense?

To leave a comment for the author, please follow the link and comment on their blog: R – Random Remarks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)