News from archivist 2.0 on eRum2016 conference

[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Ten days ago eRum2016 conference (European R Users Meeting 2016) has finished. It was a huge event that attracted over 250 attenders, both from academia and business. Meeting was a great opportunity to listen to amazing keynotes like Heather Turner, Katarzyna Stapor, Rasmus Bååth, Jakub Glinka, Ulrike Grömping, Przemyslaw Biecek, Romain Francois, Marek Gagolewski, Matthias Templ and Katarzyna Kopczewska. Big thank you goes to the whole organizing committee and dr Maciej Beręsewicz (head) especially! There were 10 workshops, 2 packages sessions, 2 data workflow sessions, 3 methodolody sessions, 1 BioR session, 2 business sessions, lightnings talks, a poster session and of course a great welcome paRty. I could not miss a chance to present news from the last release (ver 2.0) of ours archivist package.

From the eRum’2 Book of Abstracts you can learn that: Open science needs not only reproducible research but also accessible final and partial results. During the speech I will present the most valuable applications of the archivist package. The archivist is an R package for data analysis results management, which helps in managing, sharing, storing, linking and searching for R objects. The archivist package automatically retrieves the object’s meta-data and creates a rich structure that allows for easy management of calculated R objects. The archivist package extends the reproducible research paradigm by creating new ways to retrieve and validate previously calculated objects. These functionalities also result in a variety of opportunities such as: sharing R objects within reports/articles by adding hooks to R objects in table/figure captions; interactive exploration of object repositories; caching function calls; retrieving object’s pedigree along with information on how the object was created; automated tracking of performance of models.

archivist 2.0: (News from) Managing Data Analysis Results Toolkit

My presentation about new features and a present architecture of the archivist package is available on the list of all eRum2016 presentations. If it’s hard to find it, then use this link http://r-addict.com/eRum2016/#/.

I have shown that there are some requirements for data analysis results: easy to access (for further processing), verifiable, reproducible. However, the reproducibility from scratch is not always possible, so one could improve results’ accedsibility.. The reproducibility is sometimes impossible due to different

  • base version of R
  • versions of R packages
  • versions of dependent software
  • global variables

or due to the

  • limitation of the original data
  • insufficient computational machinery

Examples: Can’t gather tibble in R, Can’t install git2r nor devtools R packages on centOS 7.0 64 bit, pandoc version 1.12.3 or higher is required and was not found (R shiny), rmarkdown::render freezes because pandoc freezes when LC_ALL and LANG are unset.

Results’ format proposed in the archivist

If one would present results with the unique hook after the results then the accedsibility. could be improved. Hooks can have the format as presented below and can be an R code that when being executed downloads results from the web (in this case from the GitHub repository named eRum2016 that belongs to user called archivistR)

library(archivist)# maybe library(survminer)
archivist::aread('archivistR/eRum2016/817107d0e62a9500c4ddb1770bd03378')

plot of chunk unnamed-chunk-2

In this situation plot can be used in further processing or the data can be extracted from the plot as this the ggplot object (which by default stores data used to produce the object). For example title can be added

result <- archivist::aread('archivistR/eRum2016/817107d0e62a9500c4ddb1770bd03378')
library(ggplot2)
result$plot <- result$plot + ggtitle('Extra title')
result

plot of chunk unnamed-chunk-3

Extensions - archivist.github

If you would like to have more archivist functionalities that are synchronized with GitHub’s repository storage system (e.g. automatic push after each object’s archiving) then you might be interested in the extensions of archivist - the archivist.github

If you are interested in more use cases of the archivist package then read our posts and talks history.

To leave a comment for the author, please follow the link and comment on their blog: http://r-addict.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)