Containerizing Interactive R Markdown Documents
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The rmarkdown package is behind the versatility of R Markdown with dozens of standard and community-provided output formats, ranging from HTML, Word, and PDF, to slides, books, and interactive documents. This abundance of awesomeness is a direct continuation of a long line of predecessors: Sweave/LaTeX, knitr, and pandoc. Its success is the foundation upon which Quarto is built on.
The htmlwidgets R package provides the basis for interactive JavaScript widgets that you can embed in HTML outputs. These are pre-rendered objects that respond to various gestures, like hover and click events. You just render the document once, and you are done until the next time when the document needs updating.
True reactivity, however, requires a lot more JavaScript heavy-lifting – i.e. using Observable – or you can use Shiny as the runtime for the R Markdown document. Such documents require a web server to watch for reactive updates in the background. This makes them effectively Shiny apps.
As with any type of Shiny app, a lot of the hosting options out there require the Shiny app to run inside of a Docker container (e.g. Heroku, ShinyProxy, Fly). Because interactive R Markdown documents differ from Shiny apps in subtle ways, serving them is also slightly different. In this post, we review how to “dockerize” R Markdown documents with different runtime environments.
Prerequisites
We will use the script from the analythium/rmarkdown-docker-examples GitHub repository.
You can also pull the following two Docker images:
docker pull eddelbuettel/r2u:20.04 docker pull nginx:alpine
Runtime: Shiny
The way to make R Markdown document interactive/reactive is to add runtime: shiny
to the document’s YAML header. Now you can add Shiny widgets and Shiny render functions to the file’s R code chunks. This way the rendered HTML document will include reactive components.
Here is the runtime-shiny/index.Rmd
file as our first document (following this example):
--- title: "Runtime: shiny" output: html_document runtime: shiny --- Here are two Shiny widgets ```{r echo = FALSE} selectInput("n_breaks", label = "Number of bins:", choices = c(10, 20, 35, 50), selected = 20) sliderInput("bw_adjust", label = "Bandwidth adjustment:", min = 0.2, max = 2, value = 1, step = 0.2) ``` And here is a histogram ```{r echo = FALSE} renderPlot({ hist(faithful$eruptions, probability = TRUE, breaks = as.numeric(input$n_breaks), xlab = "Duration (minutes)", main = "Geyser eruption duration") dens <- density(faithful$eruptions, adjust = input$bw_adjust) lines(dens, col = "blue") }) ```
You should use rmarkdown::run()
instead of rmarkdown::render("index.Rmd")
to get the Shiny app running that will look like this:
We will use the following Dockerfile:
FROM eddelbuettel/r2u:20.04 RUN apt-get update && apt-get install -y --no-install-recommends \ pandoc \ && rm -rf /var/lib/apt/lists/* RUN install.r shiny rmarkdown RUN addgroup --system app && adduser --system --ingroup app app WORKDIR /home/app COPY runtime-shiny . RUN chown app:app -R /home/app USER app EXPOSE 3838 CMD ["R", "-e", "rmarkdown::run(shiny_args = list(port = 3838, host = '0.0.0.0'))"]
Here is the explanation for each line:
- the
eddelbuettel/r2u
parent image represents one of the most significant improvements in developer experience in the past few years, it cuts Docker build times to seconds due to full dependency resolution and using Ubuntu'sapt
package manager (read more about it here) - we need a newer version of
pandoc
than the standard package for the fancy R Markdown features we are using - install R packages
- add a user named
app
and create a/home/app
folder for this user - copy the contents of the
runtime-shiny
folder into the/home/app
folder - set file permissions and set the
app
user the user of the container - expose port 3838
- define the command using
rmarkdown::run()
and making sure Shiny runs on the port that we expect it
You can build and run the image:
docker build -f Dockerfile.shiny -t psolymos/rmd:shiny . docker run -p 8080:3838 psolymos/rmd:shiny
Visit localhost:8080
to see the R Markdown document running as a Shiny app.
However, because it requires a full document render for each end user browser session it can perform poorly for documents that don’t render quickly.
Runtime: Shinyrmd
Prerendered Shiny documents represent an improvement. The Shiny runtime can perform poorly for documents that don’t render quickly. This is where runtime: shinyrmd
(or its alias, runtime: shiny_prerendered
) comes in. Such documents are pre-rendered before deployment so that the HTML loads faster. No need to wait for Shiny to render it for us.
The Shinyrmd runtime also comes with various contexts: server-start/setup/data (that is analogous to global.R
), render (like the UI), and server. These contexts provide a hybrid model of execution, where some code is run once when the document is pre-rendered and some code is run every type the user interacts with the document.
The runtime-shinyrmd
folder contains another Rmd file (based on this flexdashboard example):
--- title: "Runtime: shinyrmd" output: flexdashboard::flex_dashboard runtime: shinyrmd --- ```{r setup, include=FALSE} library(dplyr) knitr::opts_chunk$set(echo = FALSE) ``` ```{r data, include=FALSE} faithful_data <- sample_n(faithful, 100) ``` Column {.sidebar} -------------------------------------------- ```{r} selectInput("n_breaks", label = "Number of bins:", choices = c(10, 20, 35, 50), selected = 20) sliderInput("bw_adjust", label = "Bandwidth adjustment:", min = 0.2, max = 2, value = 1, step = 0.2) ``` Based on [this](...) example. Column -------------------------------------------- ### Geyser Eruption Duration ```{r} plotOutput("eruptions") ``` ```{r, context="server"} output$eruptions <- renderPlot({ hist(faithful_data$eruptions, probability = TRUE, breaks = as.numeric(input$n_breaks), xlab = "Duration (minutes)", main = "Geyser Eruption Duration") dens <- density(faithful_data$eruptions, adjust = input$bw_adjust) lines(dens, col = "blue") }) ```
You can render and run with rmarkdown::run()
:
The Dockerfile is slightly modified from the Shiny runtime:
- we need 2 more dependencies
- we need to pre-render the document with
rmarkdown::render()
so that it is there when we spin up the container
FROM eddelbuettel/r2u:20.04 RUN apt-get update && apt-get install -y --no-install-recommends \ pandoc \ && rm -rf /var/lib/apt/lists/* RUN install.r shiny rmarkdown flexdashboard dplyr RUN addgroup --system app && adduser --system --ingroup app app WORKDIR /home/app COPY runtime-shinyrmd . RUN R -e "rmarkdown::render('index.Rmd')" RUN chown app:app -R /home/app USER app EXPOSE 3838 CMD ["R", "-e", "rmarkdown::run(shiny_args = list(port = 3838, host = '0.0.0.0'))"]
Build and run:
docker build -f Dockerfile.shinyrmd -t psolymos/rmd:shinyrmd . docker run -p 8080:3838 psolymos/rmd:shinyrmd
Visit localhost:8080
to see the R Markdown document running as a pre-rendered Shiny app.
The docker build is super fast, thanks to the r2u
image we used. The image size is around 1 GB, a bit larger than the ~800 GB parent image.
Runtime: Static
Static runtime, as its name implies, creates a static document. It stays the same until some of the document's inputs (images, data) change and the document is re-rendered. This gives us an easy way to just locally render the HTML document, copy it into a Docker image, then serve it using Nginx using this Dockerfile:
FROM nginx:alpine COPY runtime-static/index.html /usr/share/nginx/html/index.html CMD ["nginx", "-g", "daemon off;"]
This creates a tiny image (30 MB). Run the container and forward the port 80 where Nginx serves the static files to see the result.
What if you want to take advantage of a Docker-based build environment? You might experience issues with some of the dependencies on certain operating systems, or your IT department might not allow you to install packages yourself but you can use Docker ... Or what if you just want to complicate something that should be simple?
This brings us to a neat Docker build feature called multi-stage builds. We know that our Ubuntu-based image is quite big, so we only want to use that to render the HTML. Once it is done, we just insert that artifact into the small Nginx image.
Multi-stage build
With multi-stage builds, you use multiple FROM
statements in your Dockerfile. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.
Let's see how this works for our R Markdown example. Here is the stripped-down static index.Rmd
file from the runtime-static
folder:
--- title: "Runtime: static" output: flexdashboard::flex_dashboard runtime: static --- ```{r setup, include=FALSE} library(dplyr) knitr::opts_chunk$set(echo = FALSE) ``` ```{r data, include=FALSE} faithful_data <- sample_n(faithful, 100) ``` Column {.sidebar} -------------------------------------- Based on [this](...) example. Column ------------------------------------- ### Geyser Eruption Duration ```{r} hist(faithful_data$eruptions, probability = TRUE, breaks = 20, xlab = "Duration (minutes)", main = "Geyser Eruption Duration") dens <- density(faithful_data$eruptions, adjust = 1) lines(dens, col = "blue") ```
The rendered document:
Here is the 2-stage Dockerfile:
FROM eddelbuettel/r2u:20.04 AS builder RUN apt-get update && apt-get install -y --no-install-recommends \ pandoc \ && rm -rf /var/lib/apt/lists/* RUN install.r shiny rmarkdown flexdashboard dplyr WORKDIR /root COPY runtime-static . RUN R -e "rmarkdown::render('index.Rmd', output_dir = 'output')" FROM nginx:alpine COPY --from=builder /root/output /usr/share/nginx/html CMD ["nginx", "-g", "daemon off;"]
The 1st stage looks familiar, except we don't worry about being the root
user for the build step. We name this stage builder
using AS {name}
after the FROM
instruction.
The 2nd stage uses another FROM
instruction, and we specify that we COPY
from the builder
stage: --from=builder
. We grab all the rendered HTML and move it to the Nginx HTML folder to be served by the file server.
We just took advantage of the R build environment to render the document, and we ended up with a minimal-sized image with the static content inside.
Build and run:
docker build -f Dockerfile.static -t psolymos/rmd:static . docker run -p 8080:80 psolymos/rmd:static
Conclusions
The Shiny and the pre-rendered Shinyrmd runtimes for R Markdown make it possible to write interactive documents that users can interact with. This is a great way to get started with reactive programming for folks who are already familiar with R Markdown.
We can treat such interactive documents similarly to Shiny apps and deploy them using Docker containers. When it comes to static R Markdown documents, there is nothing that can prevent us from serving these from containers. We learned how to minify the Docker image using a multi-stage build.
Further reading
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.