Embedding RData files in Rmarkdown files for more reproducible analyses

[This article was first published on BayesFactor: Software for Bayesian inference, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


For those of us interested in reproducible analysis, Rmarkdown is a great way of communicating our code to other researchers. Rstudio, in particular, makes it very easy to create attractive HTML document containing text, code, and figures, which can then be sent to colleagues or put on the internet for anyone to see. If you aren’t using Rmarkdown for your statistical analyses, I recommend you start; you’ll never go back to simple script files again (and your colleagues won’t want you to).

In this post, I describe how to improve your Rmarkdown by embedding data that can be downloaded by anyone viewing the document in a modern browser with javascript enabled. For a quick look, see the example Rmd file and resulting HTML file.

One of the drawbacks of Rmakdown, from a reproducible analysis perspective, is that the data is not a part of the document itself. Typically, an Rmarkdown file will use R code to load a file from your disk, and when you send the resulting HTML file to a colleague, or put it on the internet, that file is separate. It must be sent in an email or placed on a server to be downloaded.

This raises the possibility that the data could get separated from the code, and I think this is a terrible thing for reproducible analysis. In my mind, the data and the document and data should travel together as a single document. What we would like is a method of encoding R data into the HTML file such that any user who has access to the HTML file can download it, without even having access to the internet.

As it turns out, files can be encoded in an HTML document via the URI data scheme. All we need is an R function that encodes the data, and produces a link to enable downloading the data.
setDownloadURI = function(list, filename = stop("'filename' must be specified"), textHTML = "Click here to download the data.", fileext = "RData", envir = parent.frame()){
  require(base64enc,quietly = TRUE)
  divname = paste(sample(LETTERS),collapse="")
  tf = tempfile(pattern=filename, fileext = fileext)
  save(list = list, file = tf, envir = envir)
  filenameWithExt = paste(filename,fileext,sep=".")

  uri = dataURI(file = tf, mime = "application/octet-stream", encoding = "base64")
  cat("<a style='text-decoration: none' id='",divname,"'></a>
    <script>
    var a = document.createElement('a');
    var div = document.getElementById('",divname,"');
    div.appendChild(a);
    a.setAttribute('href', '",uri,"');
    a.innerHTML = '",textHTML,"' + ' (",filenameWithExt,")';
    if (typeof a.download != 'undefined') {
      a.setAttribute('download', '",filenameWithExt,"');
    }else{
      a.setAttribute('onclick', 'confirm("Your browser does not support the download HTML5 attribute. You must rename the file to ",filenameWithExt," after downloading it (or use Chrome/Firefox/Opera). ")');
    }
    </script>",
    sep="")
}
The first argument of the function, list, is a character vector containing names of variables to save in the RData file.

Once this function is declared, all we need to do is call it in our Rmd file. If we use the argument results = 'asis' in our R code block, it will inject the appropriate HTML code into our compiled HTML document to allow a download of the embedded data as an RData file, and anyone with the HTML file can download it.

Unfortunately, blogger will not allow me to embed the data into a post; therefore, a complete, self-contained example Rmd file can be found here, and the resulting HTML file can be found here.

Keep in mind, however, that the data file is actually embedded in the HTML file. This means that the resulting HTML file can be very large, if your data file is large. Also consider that data are encoded in base64, which increases the size of the file by about a third over the equivalent RData binary file. For very large data sets, one might consider hosting them outside of the HTML file; but for many purposes, the technique I describe will improve the ease with which you can share reproducible analyses.

To leave a comment for the author, please follow the link and comment on their blog: BayesFactor: Software for Bayesian inference.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)