One weird trick to compile multipartite dynamic documents with Rmarkdown

[This article was first published on biochemistries, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This afternoon I stumbled across this one weird trick an undocumented part of the YAML headers that get processed when you click the ‘knit’ button in RStudio. Knitting turns an Rmarkdown document into a specified format, using the rmarkdown package’s render function to call pandoc (a universal document converter written in Haskell).

If you specify a knit: field in an Rmarkdown YAML header you can replace the default function (rmarkdown::render) that the input file and encoding are passed to with any arbitrarily complex function.

For example, the developer of slidify passed in a totally different function rather than renderslidify::knit2slides.

I thought it’d be worthwhile to modify what was triggered upon clicking that button – as simply as using a specified output file name (see StackOverflow here), or essentially running a sort of make to compose a multi-partite document.

Here’s an exemplar Rmarkdown YAML header which threads together a document from 3 component Rmarkdown subsection files:

  • The title: field becomes a top-level header (#) in the output markdown
  • The knit: field (a currently undocumented hook) replaces rmarkdown::render with a custom function specifying parameters for the rendering.

    Yes, unfortunately it does have to be an unintelligible one-liner unless you have your own R package [with the function exported to the namespace] to source it from (as package::function). Here’s the above more clearly laid out:

    • Firstly, every section’s Rmarkdown file is rendered into markdown [with the same name by default]
    • Each of these files are ‘included’ after the ‘body’ (cf. the header) of this README, if they’re in the includes: after_body:[...] list.
    • The quiet=TRUE parameter silences the standard “Output created: …” message following render() which would otherwise trigger the RStudio file preview on the last of the intermediate markdown files created.
    • After these component files are processed, the final README markdown is rendered (includes appends their processed markdown contents), and this full document is previewed.
  • All Rmd files here contain a YAML header, the constituent files having only the output:md_document:variant field:

    …before their sub-section contents:

## Comparison of cancer types surveyed

Comparing cancer types in this paper to CRUK's most up to date prevalence   statistics...

Alternative modular setup

One of the problems custom knit functions can also solve is the time it takes for large manuscripts to compile – a huge obstacle to my own use of Rmarkdown which I’m delighted to overcome, and what’s stopped me from recommending it to others as a practical writing tool despite its clear potential.

E.g., if using knitcitations, each reference is downloaded even if the bibliographic metadata has already been obtained. Along with generating individual figures etc., the time to ‘compile’ an Rmarkdown document can therefore scale exorbitantly when writing a moderately sized manuscript (rising from seconds to tens of minutes in the extreme as I saw on a recent essay), breaking the proper flow of writing and review, and imposing a penalty on extended Rmarkdown compositions.

A modular structure is the only rational way of doing this, but isn’t described anywhere for Rmarkdown’s dynamic documents (to my knowledge?).

In such a framework, the ‘main’ document’s knit function would be as above, but lacking the first step of compiling each .Rmd.md (these having been done separately upon each edit), so that pre-made .md files would just be included (instantly) in the final document:

Much more sensibly, the edited Rmarkdown component files (subsections) wouldn’t need to be re-processed — e.g. have all references and figures generated — rather this would be done per file, each of which could in turn potentially have custom knit: hooks (though note that the example below only works to prevent the file preview, there’s scope to do much more with it)

via Software Carpentry

The idea would be to follow what this Software Carpentry video describes regarding makefiles for reproducible research papers. In theory, the initially described knit: function could generate a full paper including analyses from component section files, each of which could in turn have their own knit: hooks.

The example above creates a README.md file suitable for display in a standard GitHub repository, though it’s not advisable to write sprawling READMEs: it could easily be tweaked to give a paper.pdf as for the Software Carpentry example, using a PDF YAML output header instead for the final .md.pdf step after including the component parts.

For what it’s worth, my current YAML header for a manuscript in PDF is:

… and in the top matter (after the YAML, before the markdown, for the LaTeX engine & R):

A minor limitation I see here is that it’s not possible to provide subsection titles through metadata — at present the title is written to markdown with a hardcoded ‘# ’ prefix. In a reproducible manuscript utopia the title: field could still be specified and markdown header prefix of the appropriate level generated accordingly perhaps (which might also allow for procedural sub-section numbering – 1.2, 1.2.1 etc.).

The above can also be found on my GitHub development notes Wiki, but it’s not possible to leave comments there. Feedback and more tips and tricks for Rmarkdown workflows are welcome.

✎ Check out the rmarkdown package here, and general Rmd documentation here.

To leave a comment for the author, please follow the link and comment on their blog: biochemistries.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)