Quick Hit: Automating Production Graphics Uploads in R Markdown Documents with googledrive

January 24, 2019
By

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

As someone who measures all kinds of things on the internet as part of his $DAYJOB, I can say with some authority that huge swaths of organizations are using cloud-services such as Google Apps, Dropbox and Office 365 as part of their business process workflows. For me, one regular component that touches the “cloud” is when I have to share R-generated charts with our spiffy production team for use in reports, presentations and other general communications.

These are typically project-based tasks and data science team members typically use git- and AWS-based workflows for gathering data, performing analyses and generating output. While git is great at sharing code and ensuring the historical integrity of our analyses, we don’t expect the production team members to be or become experts in git to use our output. They live in Google Drive and thanks to the googledrive🔗 package we can bridge the gap between code and output with just a few lines of R code.

We use “R projects” to organize things and either use spinnable R scripts or R markdown documents inside those projects to gather, clean and analyze data.

For 2019, we’re using new, work-specific R markdown templates that have one new YAML header parameter:

params:
  gdrive_folder_url: "https://drive.google.com/drive/u/2/SOMEUSELESSHEXSTRING"

which just defines the Google Drive folder URL for the final output directory in the ☁.

Next is a new pre-configured knitr chunk call at the start of these production chart-generating documents:

knitr::opts_chunk$set(
  message = FALSE,
  warning = FALSE, dev = c("png", "cairo_pdf"),
  echo = FALSE,
  fig.retina = 2,
  fig.width = 10,
  fig.height = 6,
  fig.path = "prod/charts/"
)

since production team want PDF so they can work with it in their tools and — in our testing — cairo_pdf produces the best/most consistent output, but PNGs show up better in the composite HTML documents so we use that order deliberately.

The real change is the consistent naming of the fig.path directory. By doing this, all we have to do is add a few lines (again, automatically generated) to the bottom of the document to have all the output automagically go to the proper Google Drive folder:

# Upload to production ----------------------------------------------------

googledrive::drive_auth()

# locate the folder
gdrive_prod_folder <- googledrive::as_id(params$gdrive_folder_url)

# clean it out
gdrls <- googledrive::drive_ls(gdrive_prod_folder)
if (nrow(gdrls) > 0) {
  dplyr::pull(gdrls, id) %>%
    purrr::walk(~googledrive::drive_rm(googledrive::as_id(.x)))
}

# upload new
list.files(here::here("prod/charts"), recursive = TRUE, full.names = TRUE) %>%
  purrr::walk(googledrive::drive_upload, path = gdrive_prod_folder)

Now, we never have to remember to drag documents into a browser and don’t have to load invasive Google applications onto our systems to ensure the right folks have the right files at the right time. We just have to use the new R markdown document type to generate a starter analysis document with all the necessary boilerplate baked in. Plus, .httr-oauth file is automatically ignored in .gitignore so there’s no information leakage to shared git repositories.

FIN

If you want to experiment with this, you can find a pre-configured template in the markdowntemplates package over at sr.ht, GitLab, or GitHub.

If you install the package you’ll be able to select this output type right from the new document dialog:

and new template will be ready to go with no copying, cutting or pasting.

Plus, since the Google Drive folder URL is an R markdown parameter, you can also use this in script automation (provided that you’ve wired up oauth correctly for those scripts).

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)