This afternoon I stumbled across
this one weird trick an undocumented part of the YAML headers that get processed when you click the ‘knit’ button in RStudio. Knitting turns an Rmarkdown document into a specified format, using the rmarkdown package’s render function to call pandoc (a universal document converter written in Haskell).
If you specify a
knit: field in an Rmarkdown YAML header you can replace the default function (
rmarkdown::render) that the input file and encoding are passed to with any arbitrarily complex function.
For example, the developer of slidify passed in a totally different function rather than
I thought it’d be worthwhile to modify what was triggered upon clicking that button – as simply as using a specified output file name (see StackOverflow here), or essentially running a sort of
make to compose a multi-partite document.
Here’s an exemplar Rmarkdown YAML header which threads together a document from 3 component Rmarkdown subsection files:
title:field becomes a top-level header (
#) in the output markdown
knit:field (a currently undocumented hook) replaces
rmarkdown::renderwith a custom function specifying parameters for the rendering.
Yes, unfortunately it does have to be an unintelligible one-liner unless you have your own R package [with the function exported to the namespace] to source it from (as
package::function). Here’s the above more clearly laid out:
- Firstly, every section’s Rmarkdown file is rendered into markdown [with the same name by default]
- Each of these files are ‘included’ after the ‘body’ (cf. the header) of this README, if they’re in the
quiet=TRUEparameter silences the standard “Output created: …” message following
render()which would otherwise trigger the RStudio file preview on the last of the intermediate markdown files created.
- After these component files are processed, the final README markdown is rendered (
includesappends their processed markdown contents), and this full document is previewed.
All Rmd files here contain a YAML header, the constituent files having only the
…before their sub-section contents:
## Comparison of cancer types surveyed Comparing cancer types in this paper to CRUK's most up to date prevalence statistics...
Alternative modular setup
One of the problems custom knit functions can also solve is the time it takes for large manuscripts to compile – a huge obstacle to my own use of Rmarkdown which I’m delighted to overcome, and what’s stopped me from recommending it to others as a practical writing tool despite its clear potential.
E.g., if using
knitcitations, each reference is downloaded even if the bibliographic metadata has already been obtained. Along with generating individual figures etc., the time to ‘compile’ an Rmarkdown document can therefore scale exorbitantly when writing a moderately sized manuscript (rising from seconds to tens of minutes in the extreme as I saw on a recent essay), breaking the proper flow of writing and review, and imposing a penalty on extended Rmarkdown compositions.
A modular structure is the only rational way of doing this, but isn’t described anywhere for Rmarkdown’s dynamic documents (to my knowledge?).
In such a framework, the ‘main’ document’s knit function would be as above, but lacking the first step of compiling each
.md (these having been done separately upon each edit), so that pre-made
.md files would just be
included (instantly) in the final document:
Much more sensibly, the edited Rmarkdown component files (subsections) wouldn’t need to be re-processed — e.g. have all references and figures generated — rather this would be done per file, each of which could in turn potentially have custom
knit: hooks (though note that the example below only works to prevent the file preview, there’s scope to do much more with it)
via Software Carpentry
The idea would be to follow what this Software Carpentry video describes regarding makefiles for reproducible research papers. In theory, the initially described
knit: function could generate a full paper including analyses from component section files, each of which could in turn have their own
The example above creates a
README.md file suitable for display in a standard GitHub repository, though it’s not advisable to write sprawling READMEs: it could easily be tweaked to give a
paper.pdf as for the Software Carpentry example, using a PDF YAML output header instead for the final
For what it’s worth, my current YAML header for a manuscript in PDF is:
… and in the top matter (after the YAML, before the markdown, for the LaTeX engine & R):
A minor limitation I see here is that it’s not possible to provide subsection titles through metadata — at present the title is written to markdown with a hardcoded ‘# ’ prefix. In a reproducible manuscript utopia the
title: field could still be specified and markdown header prefix of the appropriate level generated accordingly perhaps (which might also allow for procedural sub-section numbering – 1.2, 1.2.1 etc.).
The above can also be found on my GitHub development notes Wiki, but it’s not possible to leave comments there. Feedback and more tips and tricks for Rmarkdown workflows are welcome.