Why should you backup your R objects?

[This article was first published on SmarterPoland.pl » English, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There is a saying that there are two groups of people: those who are already doing backups and those who will. So, how this is linked with reproducible research and R?

If your work is to analyze data then you often face a need to restore/recreate/update results that you have generated some time ago.
You may think ,,I have a knitr reports for everything!”. That’s great! It will save you a lot of troubles. But to have 100% of warranty for exactly same results you need to have exactly the same environment and same versions of packages.

Do you know how many R packages have been updated during last 12 months?

I took list of top 20 R packages from here, scrap dates of their current and older CRAN releases from here and generate a plot with dates of submissions to CRAN sorted along date of last submission.

Load this plot directly to R: archivist::aread('pbiecek/archivist/scripts/packDev/039745c40ab717f4459c5144343baca1')

Screen Shot 2016-02-15 at 21.51.41

How many of current versions of selected packages were on CRAN 12 months ago?
The ecdf for dates of current releases.

Load this plot directly to R: archivist::aread('pbiecek/archivist/scripts/packDev/923ec99f79cce099408d4973471dd30d)

Screen Shot 2016-02-15 at 21.51.51

Around 50% of these packages were updated in last 12 months. And sometimes these changes have a huge impact, like version 2.0 of ggplot2.

In order to recreate the exactly same results you either need to keep copy of important (all?) packages or keep copy of obtained results.

With current version of archivist (2.0) you can easily (just with one line) archive all created objects and embed hooks to these objects into your report. It’s enough to use addHooksToPrint() function at the beginning of your knitr script.

How it’s better than simple ‘save()’ function? Lot’s of additional features, like you can ask for session info for a given artifact

R> archivist::asession("pbiecek/archivist/scripts/packDev/923ec99f79cce099408d4973471dd30d") $packages        package *  version       date 1    archivist *      2.0 2016-02-12 2   assertthat        0.1 2013-12-06 3       bitops      1.0-6 2013-08-17 4   colorspace      1.2-6 2015-03-11 5          DBI      0.3.1 2014-09-24 6     devtools      1.9.1 2015-09-11 7       digest      0.6.9 2016-01-08 8        dplyr *    0.4.3 2015-09-01 9           DT *      0.1 2015-06-09

For more examples see this knitr+archivist report
https://rawgit.com/pbiecek/archivist/master/scripts/listOfPackages.html
to reproduce or retrieve all results presented here.

To leave a comment for the author, please follow the link and comment on their blog: SmarterPoland.pl » English.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)