Site icon R-bloggers

How to compile R Markdown documents using Docker

[This article was first published on Jarno’s blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I do some freelance web programming and got a request from a client to make a new monthly sales report for their web shop. After specifying what should be in it, I thought to myself, “this would be so quick to make with R Markdown and ggplot2… but wait, why not make it with R Markdown and ggplot2?”

This was actually a pefect case for it, since it needed to be printable pdf and come out once a month. So no need for shiny interactivity this time (no pun intended), just a formally looking sales report with figures and tables in it. Let’s do it.

The first thing to think about was how to generate the pdf reports in the web server. Generating pdf:s with R Markdown requires first of all R, but also LaTeX to be installed, and as you might guess, they were not available in the web server. An easy way to avoid having to install additional packages to a web server itself is to use Docker containers that contain the required packages.

Using R Markdown in a Docker container

As mentioned above, using a Docker container saves me from having to installing R, LaTeX and other dependencies to the server itself. They don’t need to be maintained and I can easily deploy the container to any other web server if a need arises.

I’m going to use rocker/verse image from the rocker R images for creating the report. This image has the R Markdown and LaTeX systems pre-installed for compiling pdf reports. If you are new to Docker, here is the official documentation of how to get it installed to your system.

Assuming you have Docker installed, let’s pull the rocker/verse image with R version 3.5.1:

docker pull rocker/verse:3.5.1

First I’m going to test the image to see what R packages I possibly need to add to the image. This image has also RStudio pre-installed in it, and it is configured to run the RStudio server by default. So for testing, running the following command will start RStudio in localhost:8787:

docker run --rm -p 8787:8787 rocker/verse:3.5.1

Once you type localhost:8787 to your browser, you will be asked for a username and password, which are both rstudio by default. To my happy surprise, all the packages that I needed were already installed so I could just go ahead running the pdf compilation.

Generating the pdf with Docker

Let’s begin by compiling pdf reports with the rocker/verse:3.5.1 image. I have provided a git repository that contains example files for creating pdf reports with R Markdown and Docker. You can clone it to you computer by running

git clone https://github.com/jlintusaari/R-docker-report.git

The repository contains an example_report.R that takes as input a csv file and generates a pdf report using the example_report.Rmd template below:

---
title: "Texas housing sales report"
author: "Matti Meikäläinen"
date: "`r paste(Sys.Date())`"
output: pdf_document
---

## Number of sales

The following chart shows the number of housing sales in three cities in Texas.
Data is from the `txhousing` data set provided by the TAMU real estate center.

```{r echo=FALSE, message=FALSE, warning=FALSE}
ggplot(dt, aes(date, sales, color=city)) + geom_point() + geom_smooth()
```

The generated pdf looks like this:

The example csv data is stored in data.csv file and was created with the R/make_csv.R script. Assuming we are in the folder where the example_report.R file is located, we run the following command to compile the report with Docker:

docker run --rm -v $PWD:/report -w /report rocker/verse:3.5.1 \
 Rscript --vanilla example_report.R data.csv

The compilation works, but there are multiple things that could be improved:

  1. The default user in the container is root, causing the generated pdf to be owned by root
  2. The above command is rather long and requires setting e.g. the working directory with -w
  3. With my actual report, the latex system was missing some packages that were automatically installed but made the pdf compilation slow

So let’s create a new Docker image based on rocker/verse:3.5.1 that is better configured for our purposes. We will do this using a Dockerfile. The following Dockerfile starts the image creation from the rocker/verse:3.5.1 image and adds configurations to address the above issues:

FROM rocker/verse:3.5.1

# My sales report required an additional latex package called `eurosym`.
# RUN tlmgr install eurosym

# Set a user and the working directory
USER rstudio
WORKDIR /report

# Set the container to run `Rscript --vanilla ` by default
ENTRYPOINT ["/usr/local/bin/Rscript", "--vanilla"]

# Set the `example_report.R data.csv` as the default script to run with ENTRYPOINT's Rscript
CMD ["example_report.R", "data.csv"]

You can find this file from the docker folder in the git repository. Now assuming you are in the docker folder where the Dockerfile is located, you can create the new image with:

docker build -t report-maker .

After this we can use the report-maker image to make the pdf compilation both faster and more convenient. The following command will create the report (remember to remove the root owned example_report.pdf first if you haven’t):

docker run --rm -v $PWD:/report report-maker

Of course our report-maker image can be used to run any kind of R scripts, but the name is descriptive for our purpose.

Production use

The web application I’m deploying this for is implemented with Ruby on Rails. Rails provides an inbuilt task system that can easily access the database of the web-application to retrieve relevant data for report creation.

In this particular case the pdf reports need to be generated once a month. I will thus create a rails task that:

  1. Queries the database and saves the relevant information to a csv file
  2. Provides the generated csv file to the report-maker

After that I setup a cron job that runs the task the first day of every month. Once generated, the report is downloadable for the client through the web application.

To leave a comment for the author, please follow the link and comment on their blog: Jarno’s blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.