R Hero saves Backup City with archivist and GitHub

[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Have you ever suffered because of the impossibility of reproducing graphs, tables or analysis’ results in R? Have you ever bothered yourself for not being able to share R objects (i.e., plots or final analysis models) within your reports, posters or articles? Or maybe simply you have too many objects you can’t manage to store in a convenient and handy way? Now you can share partial results of analysis, provide hooks to valuable R objects within articles, manage analysis’ results and restore objects’ pedigree with archivist package and its extension archivist.github, allautomatically through GitHub without closing RStudio. If you are tired of archiving results by yourself, then read how you can became an R Hero with the archivist.github package power.

R Hero archiving power

Recently I’ve visited Backup City, a data analysis mecca in the middle of Reproducible Research RLand. That’s where I ovearheared a feverish discussion between R Hero and commissar O’Rdon. You can read the story of their meeting at the opening comic.

archivist.gitub: archivist and GitHub integration

archivist.github is a package with tools for archiving, managing and sharing R objects via GitHub and is the extension of the archivist. You can install package from CRAN
I have prepared a workflow graph to visualize functionalities of archivist.github and provide explanation of core powers in this post. After you’ve created a GitHub developer application (the process is described at archivist.github: 2.1 OAuth open autorization, set: Homepage URL – http://github.com, Authorization callback URL – http://localhost:1410) you will be able to automatically create repositories on GitHub from R console. Below is an example on how to authorise with GitHub API (using your application Client ID and Client Secret), create a GitHub repository with archivist-like Repository and automatically archive R object on GitHub
# I saved some variables earlier 
# no to provide them publicly

# this can be done only in interactive mode
# so have to be done before knitr compilation
# authoriseGitHub(ClientID, ClientSecret,
# scope = c("public_repo", "delete_repo")) ->
# github_token 
# repository creation
createGitHubRepo(repo = "RHero", 
                 user = "archivistR", 
                 password = password,
                 github_token = github_token,
                 default = TRUE)
[1] "archivistR"
# -> https://github.com/archivistR/RHero
# parameters can be set globally,
# so you will not have to specify 
# them for each call
aoptions("password", password)
aoptions("user", "archivistR") 
aoptions("repo", "RHero")
aoptions("github_token", github_token)
# archiving on GitHub
archive(iris, alink = TRUE)
archivist::aread('archivistR/RHero/ff575c261c949d073b2895b05d1097c3') One can check that the artifact is really on GitHub and that the commit was performed (with great help of git2r package)
# sometimes GitHub need more 
# time to react
# show archived objects with their hashes
                           md5hash                             name         createdDate
1 ff575c261c949d073b2895b05d1097c3                             iris 2016-06-13 16:41:17
2 2e7b44a1845602a5e3a4898b618b4aa6 2e7b44a1845602a5e3a4898b618b4aa6 2016-06-13 16:41:17
# one can check how many commits have been performed so far
[1] 2
Each object (referred as artifact) is archived with it’s metadata and md5hash in case someone would like to restore or search for archived objects within Repository.

Partial results archiving and objects’ pedigree restoration

We have prepared extended version of pipe – %>% operator %a% so that every partial result of analysis workflow can be archived. Below is an example of workflow archiving for RTCGA (about which I wrote here) RNASeq data (genes’ expression) (broader example can be find here) and it’s pedigree restoration
# This sets `silent=TRUE` in saveToRepo
# which is used by %a% . There will be 
# no warning printed about archiving 
# the same artifact or it's data twice.
aoptions('silent', TRUE)
aoptions('repoDir', 'RHero')
 # %a% archives to Local Repos, that's
# why we need 'repoDir'
# information about genes' expressions
library(RTCGA.rnaseq); data(BRCA.rnaseq)
BRCA.rnaseq %a%
           bcr_patient_barcode) %a%
    # bcr_patient_barcode contains a key to 
    # merge patients between various datasets
    rename(TP53 = `TP53|7157`) %a%
                  14, 15) == "01" ) -> 
    # 01 at the 14-15th position tells 
    # these are cancer sample
# knitr: results='asis'.
# give hooks to objects
         format = "kable",
         alink = TRUE ) 
call md5hash
4 env[[nm]] 63678e012c5b7f40966c32eec91f828b
3 select(TP53|7157, bcr_patient_barcode) 4a85ce61229dd743b911d7edab0310b3
2 rename(TP53 = TP53|7157) 103f2b82c41956e9f6437b3a0cd68679
1 filter(substr(bcr_patient_barcode, 14, 15) == “01”) 1da5a026aae19e0d0467ba3773679e28
# it uses global user and repo
# if they are not specified
Column with [[env]] is the object before transformations. We are working on using original names for objects in this issue. This operation does not archive objects automatically on GitHub as this is functionality from base archivist package. One have to upload objects with
# one can check how many commits have been performed so far
[1] 3

Overload print() to use archive()

After global parameters specification (aoptions() function sets ‘user’, ‘repo’, and ‘password’ parameters for each archivist.github and archivist function globally) we don’t have to use archive function after each call to provide hooks in rmarkdown reports. We can overload print() function for specific classes so that after printing objects will be also evaluated with archive function.
addHooksToPrintGitHub(class = "lm")
# knitr: results='asis'
pld <- aread("MarcinKosinski/Museum/04eb0bdc12")
(lm.D9 <- lm(weight ~ group,
             data = pld))
Load: archivist::aread('archivistR/RHero/2b639023bc41e289aa21d790d5876736') Call: lm(formula = weight ~ group, data = pld) Coefficients: (Intercept) groupTrt 5.032 -0.371
(lm.D90 <- lm(weight ~ group - 1,
              data = pld))
Load: archivist::aread('archivistR/RHero/a33c804ff1d0b652210a39e2071d1e14') Call: lm(formula = weight ~ group - 1, data = pld) Coefficients: groupCtl groupTrt 5.032 4.661 This is the GitHub equivalent for local archiving with addHooksToPrint

Feedback and Notes

If you have any comments or user request, please see Feedback and Notes section to be aware of our future plans. More examples can be checked at archivist.github Tutorial or you can learn more during @pbiecek talk How to use the archivist package to boost reproducibility of your research at useR2016 Conference. If you’d like to meet more R Heroes then restore message that was archived for commisar O’Rdon with
# library(archivist.github)
European R users meeting (eRum 2016) will take place between October 12th and 14th.

We already have confirmed great invited speakers such as: Rasmus Bååth, Romain François, Ulrike Grömping, Matthias Templ, and Heather Turner, as well as strong representation from Poland: Przemysław Biecek, Marek Gągolewski, Jakub Glinka, Katarzyna Kopczewska, and Katarzyna Stąpor. We are planning a meeting of more than 200 useRs from all across Europe working in different areas of the industry, academy, and government.

On behalf of organising committee, chaired by Maciej Beręsewicz, we want to invite you to be a part of this historical meeting by proposing a workshop, submitting a regular or lightning talk, presenting a poster, or just attending the activities we are preparing for the meeting.

You will find more details about the registration process on the website www.erum.ue.poznan.pl.

If you have any questions do not hesitate to ask through [email protected].

See you in Poznań.

Source: http://www.r-bloggers.com/european-r-users-meeting-meeting-of-r-heroes-poznan-12-14-10-2016/


Paintings were made by pedzlenie.

To leave a comment for the author, please follow the link and comment on their blog: http://r-addict.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)