Awesome R Markdown Word report with programmatically inserted headings, outputs and…

[This article was first published on Tdemarchinr in Towards Data Science on Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Awesome R Markdown Word report with programmatically inserted headings, outputs and cross-references

How to automate reporting in Word to focus on challenging problem solving

By Thomas de Marchin (Senior Manager Statistics and Data Sciences at Pharmalex) and Milana Filatenkova (Manager Statistics and Data Sciences at Pharmalex)

Photo from Maarten van den Heuvel on Unsplash

What is R: R language is widely used among statisticians and data miners for developing data analysis software.

What is R Markdown : R Markdown is a tool designed to help create reproducible, dynamic reports with R. An R Markdown document is written in markdown (an easy-to-write plain text format) and contains chunks of embedded R code.

Why using R and R Markdown : As data scientists, we often have repeated analysis to perform (example: validation of analytical methods). Automating such routines could be a good way to reduce the likelihood of human errors that often arise while performing numerous repetitive tasks and divert resources from boring copy/paste type of exercises to challenging problem solving.

Who this post is for: This post is for R users who want to use R Markdown (Rmd) scripts to create Word (docx) documents and would like to programmatically insert headings, outputs and Word cross-references.

What will not be covered: This post will not introduce R and R Markdown. There are numerous excellent tutorials available if you have just started with these tools. Also, R and MS-Word installed are prerequisites to go ahead with this tutorial.

When it comes to generating reports from with R, PDF and HTML give more elegant results. However, in a professional context, Word cannot be avoided.

There are several packages available to generate Word document from R. The most popular one is rmarkdown. The beauty of Markdown is that it renders plain text documents (the code) readable without messy tags. On the other hand, one of the drawbacks is that Word functionalities are very limited. It is for example not possible to adapt styles, add tables and figures caption and use Word cross-references. Among other packages designed to automate report generation it is worth mentioning officer. This package supplies many Word functionalities that are missing in rmarkdown. However, officer code is very difficult to read. In officer, word cross-references don’t work either. It is also worth mentioning that some R packages have been developed to manage tables, figure numbering and captions such as captioner or bookdown. These packages, however, output regular text and are unable to generate dynamic captions and references (i.e. you cannot click on these references or generate a table of figures). The latter poses a problem with cross-referencing: the references remain static when the content is updated.

Finally, let us mention the recent awesome officedown package which brings some officer features into R Markdown documents. One of the key features of officedown is the option of adding Word’s calculated fields for references and captions. In other words, this is the first R package to support true Word cross-references!

Finally, regarding appearance of R-generated outputs (plots, tables…) in the report — with rmarkdown, it is possible to programmatically insert those in the Word report and it is actually the central purpose of rmarkdown package. But what if you don’t know in advance how many tables, figures and sections the report will contain? A typical example would be a script or an application that performs a repeated analysis on multiple attributes. Let us imagine a situation when a user chooses to upload data with whatever number of attributes, 5, 10, 20, 100, … In this situation, in order to insert headings and outputs programmatically from R, the solution is to set chuck option ‘results=”asis”’in rmarkdown. This option would create R Markdown code from R using a for loop.

So, what is the purpose of this post then? Well, even if it is possible to programmatically insert outputs and headings and to get proper Word cross-references with the officedown package, I have never managed to make the two work together… When I tried to use cross-references in the for loop, it never produced the desired output (references not working, plots duplicated…). I guess this issue will be resolved at some point in a future version of officedown, but I just couldn’t wait. I needed the job done for my customer. I then had spent a lot of time looking on the Internet for a workaround and had come up with a few solutions to finally get it done! I do not claim that my approach is unique to automatic compilation of a functional Word report, there are surely other ways to achieve the same goal. The difficulty here is the scarcity of the documentation and immaturity of many R packages related to Markdown. This article is my attempt to share my findings with everyone so you won’t have to lose yourself in an endless trial and error process trying to get a nice Word document out of your Rmarkdown.

Here is the list of the key points.

The cross references are created by:

1. creating a bookmark using the run_autonum function

2. adding a caption using the block_caption function

3. citations in the text are done using the run_reference function

The subsections in the analysis section are created programmatically by using the result=‘asis’ chunk option. This option tells knitr not to wrap your text output in verbatim code blocks (as it does with normal chunks) but treat it “as is” and output raw Markdown content. In these types of chunks:

1. headings are created using cat(‘\n## HeadingName \n ’). Note that the ‘\n’ are necessary to make it work.

2. plots are outputted as: print(plot)

3. flextables have to be output using the knit_print and cat functions

4. captions (block_caption) and references (run_reference) need to be encapsulated into knit_print_block function

An example can be found below. Note that the code below as well as the Word report it generates, and all the necessary files are available on my Github: https://github.com/tdemarchin/AwesomeRmarkdownWordReport

Conclusion

While generating qualitative Word reports from R was still challenging until recently, there has been developed new packages that make it possible in a business context. The approach described in this article is a proof that there is no place for manual report writing of R outputs anymore.

Example

output

Example of a Word report automatically generated

Code

---
title: "Awesome Rmarkdown Word report with programmatically inserted headings, outputs and cross-references"
author: "Thomas de Marchin"
date: "17MAR2021"
output:
  word_document:
    reference_docx: "template.docx"
---
```{r setup, include=FALSE}
# Let's setup a few things.
rm(list=ls())
library(knitr)
library(officedown)
library(officer)
library(ggplot2)
library(tidyverse)
library(broom)
library(flextable)
# set chunks defaults
knitr::opts_chunk$set(
  echo       = FALSE,
  message    = FALSE,
  warning    = FALSE
)
# set flextable defaults
knitr::opts_chunk$set(echo = TRUE, fig.cap = TRUE)
  set_flextable_defaults(
  font.family = "Arial", font.size = 9, 
  theme_fun = "theme_vanilla",
  big.mark="", table.layout="autofit")
  
# formatting properties for specific paragraphs
centeredP <- fp_par(text.align = "center")
```
# Introduction
The aim of this document is to introduce a way to generate word reports from R using Rmarkdown with programmatically inserted headings, outputs and Word cross-references. See https://towardsdatascience.com/awesome-r-markdown-word-report-with-programmatically-inserted-headings-outputs-and-19ad0de29a22 to understand the context of this example.
# Data
We will use the built-in iris dataset as an example.
This dataset consists in Petal and Sepal width and length measurements for three iris species.
Table `r run_reference("summaryTable")` shows summary statistics.
```{r data, echo = FALSE}
# this chunk is a normal chunk compared to the next one
# create the bookmark for the summary table
tab_num <- run_autonum(seq_id = "Table", pre_label = "Table ", bkm = "summaryTable")
# add the caption
block_caption(label= "Summary table for the iris dataset",
              style = "caption", autonum = tab_num)
# create the summary table and output it with flextable()
summaryData <- iris %>% gather(value="value", key="Attribute", -Species) %>% group_by(Attribute, Species) %>% summarise(n=n(), mean=mean(value), sd=sd(value), min=min(value), max=max(value)) 
summaryData %>% flextable() %>% merge_v(j = "Attribute") %>% colformat_double(j=c("mean", "sd"), digits = 2)
```
# Analysis
```{r analysis, echo = FALSE, results = 'asis'}
# this chunk will programmatically generates Markdown code (results='asis')
# split the data by specie to simulate different attributes to analyze, it is here 3 here but it could be any number.
data <- split(iris, iris$Species)
uniqueSpecies <- levels(iris$Species)
# for loop
for(i in 1:length(uniqueSpecies)){
  
  dataSubset <- data[[uniqueSpecies[i]]]
    
  # print the heading
  cat("\n##", uniqueSpecies[i], "\n")
  
  # print an empty line
  cat("\n  <br>  \n")
  
  # print some text
  cat("Figure ")
  # reference the figure below (note the use of knit_print_run function in 'asis' chunks)
  knit_print_run(run_reference(paste0("pData", uniqueSpecies[i])))
  cat(" shows the relation between Sepal width and length.")
  
  # plot the data 
  pData <- ggplot(aes(x=Sepal.Length, y=Sepal.Width), data=dataSubset) + 
    geom_point() + geom_smooth(method='lm', se=F) + labs(title=uniqueSpecies[i])
  
  # output the plot (note the use of the print function in 'asis' chunks)
  print(pData)
  
  cat("\n") # sometimes you need to add this to make things work
  
  # Add the figure numbering and caption (note the use of the knit_print_block function in 'asis' chunks)
  fig_num <- run_autonum(seq_id = "Figure", pre_label = "Figure ", bkm = paste0("pData", uniqueSpecies[i]))
  knit_print_block(block_caption(paste0("Scatter plot of Sepal Width vs Sepal Length for ", uniqueSpecies[i], ". Blue line is a linear regression."),
              style = "caption", autonum = fig_num))
  
  
  # print some text
  cat("A linear regression was performed using Sepal.Length as the response and Sepal.Width as the explanatory variable. Table ")
  knit_print_run(run_reference(paste0("tabRegression", uniqueSpecies[i])))
  cat(" shows the parameters estimates and 95% Confidence intervals.")
  
  cat("\n") # sometimes you need to add this to make things work
  
  # do a regression generate the fit summary table
  regression <- lm(Sepal.Length ~ Sepal.Width, data=dataSubset)
  tabRegression <- tidy(regression)
  tabRegression <- cbind(tabRegression, confint(regression, level=0.95))
  
  # Add the table numbering and caption (note the use of the knit_print_block function in 'asis' chunks)
  tab_num <- run_autonum(seq_id = "Table", pre_label = "Table ", bkm = paste0("tabRegression", uniqueSpecies[i]))
  knit_print_block(block_caption(paste0("Parameters estimates of the fit for ", uniqueSpecies[i]),
              style = "caption", autonum = tab_num))
  
  
  
  # output the summary table (note the use of knit_print and cat functions in 'asis' chunks)
  tabRegression %>% flextable() %>% knit_print() %>% cat()
  
  cat('\n')
}
# Add a page break
run_pagebreak()
```
# Conclusion
`r fpar(ftext("Congratulations!", fp_text(font.size = 12, bold = TRUE, color = "#C32900")), fp_p = fp_par(text.align = "center"))`
![](cheers.jpg)

References

https://R Markdown .rstudio.com/

https://bookdown.org/yihui/bookdown/

https://ardata-fr.github.io/officeverse/

https://scienceloft.com/technical/programmatically-create-new-headings-and-outputs-in-R Markdown /

https://www.r-bloggers.com/


Awesome R Markdown Word report with programmatically inserted headings, outputs and… was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: Tdemarchinr in Towards Data Science on Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)