Site icon R-bloggers

Improving automatic document production with R

[This article was first published on R – Locke Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post, I describe the latest iteration of my automatic document production with R. It improves upon the methods used in Rtraining, and previous work on this topic can read by going to the auto deploying R documentation tag.

I keep banging on about this area because reproducible research / analytical document pipelines is an area I’ve a keen interest in. I see it as a core part of DataOps as it’s vital for helping us ensure our models and analysis are correct in data science and boosting our productivity.

Even after (or because of) a few years of off and on again development to the process, Rtraining had a number of issues:

This post covers how I’m attempting to fix all bar the last problem (more on that in a later post).

With the problems outlined, let’s look at my new base solution and how it addresses these issues.

Structure

I have built a template that can be used to generate multiple presentations and publish them to a docs/ directory for online hosting by GitHub. I can now use this template to produce category repositories, based on the folders in inst/slides/ in Rtraining. I can always split them out further at a later date.

The new repo is structured like so:

Presentations

Document generation

Automatic document generation with R

Travis

I use travis-ci to perform the presentation builds. The instructions I provide travis are:

language: r

cache: packages

latex: false

warnings_are_errors: false

install: 

  - R -e 'install.packages("devtools")'

  - R -e 'devtools::install_deps(dep = T)'

  - R CMD build --no-build-vignettes --no-manual .

  - R CMD check --no-build-vignettes --no-manual  *tar.gz

  - Rscript -e 'devtools::install(pkg = ".")'

before_script:

  - chmod +x ./buildpres.sh

script:

  - ./buildpres.sh


One important thing to note here is that I used some arguments on my package build and check steps along with latex: false to drastically reduce the build time as I have no intention of producing PDFs normally.

The install section is the prep work, and then the script section does the important bit. Now if there are errors, I’ll get notified!

Bash

The script that gets executed in my Travis build:

#!/bin/bash

AUTHORNAME="Steph"

AUTHOREMAIL="Steph@itsalocke.com"

GITURL="https://$GITHUB_PAT@github.com/$TRAVIS_REPO_SLUG.git"



git remote set-url origin $GITURL

git pull

git checkout master

git config --global user.name $AUTHORNAME

git config --global user.email $AUTHOREMAIL



R CMD BATCH './buildpres.R'



cp buildpres.Rout docs/



git add docs/

git commit -am "[ci skip] Documents produced in clean environment via Travis $TRAVIS_BUILD_NUMBER"

git push -u  --quiet origin master


R

The R step is now very minimal in that it works out what presentations to generate, then loops through them and builds each one according to the options specified in _output.yml

library(rmarkdown)

slides=list.files("pres","*.Rmd",full.names=TRUE)



for (f in slides) render(f,output_dir = "docs")


Next steps for me

This work has substantially mitigated most of the issues I had with my previous Rtraining workflow. I now have to get all my slide decks building under this new process.

I will be writing about making an improved presentation portal and how to build and maintain your own substantially modified revealjs theme at a later date.

The modified workflow and scripts also have implications on my pRojects package that I’m currently developing along with Jon Calder. I’d be very interested to hear from you if you have thoughts on how to make things more streamlined.

The post Improving automatic document production with R appeared first on Locke Data. Locke Data are a data science consultancy aimed at helping organisations get ready and get started with data science.

To leave a comment for the author, please follow the link and comment on their blog: R – Locke Data.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.