Writing a MS-Word document using R (with as little overhead as possible)

[This article was first published on R-statistics blog » RR-statistics blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The problem: producing a Word (.docx) file of a statistical report created in R, with as little overhead as possible.
The solution: combining R+knitr+rmarkdown+pander+pandoc (it is easier than it is spelled).

If you get what this post is about, just jump to the “Solution: the workflow” section.

rmd_to_docx

Preface: why is this a problem (/still)

Before turning to the solution, let’s address two preliminary questions:

Q: Why is it important to be able to create report in Word from R?

A: Because many researchers we may work with are used to working with Word for editing their text, tracking changes and merging edits between different authors, and copy-pasting text/tables/images from various sources.
This means that a report produced as a PDF file is less useful for collaborating with less-tech-savvy researchers (copying text or tables from PDF is not fun). Even exchanging HTML files may appear somewhat awkward to fellow researchers.

Q: But wasn’t this problem solved already?

A: Yes and no. There have been many attempts at solving the problem in the past several years, but many of them came with an overhead which made the solutions un-friendly (the developers and heavy users of these technologies are asked to not be offended – this is only my opinion, and you’re welcome to respond and expand my point of view).
Previous solutions include SWord and R2wd, both rely on the rcom package (and the statconnDCOM or RDCOMClient servers). Or using online converters to turn PDF files into Word files.

Q: Any more issues?
A: Yes. Another big issue is formatting the output. If I would like my tables to look nice in the output file, I would often need to start wrapping ALL of my output functions with the some function (taken from packages such as xtable, rms, quantreg, stargazer, pander, and more.

Sources/links

The solution I propose here is a combination of using the following R packages: knitr, rmarkdown, pander. Combined with the external tool pandoc (easily installed using the installr package).

Combining these ideas together has been discussed before in various places in the past half year or so, here are just a few:

Solution: the workflow

An overview of the steps:

  1. Write text with R code chunks weaved-together (I do it using RStudio, markdown, knitr – in an .rmd file)
  2. At the beginning of the file – make sure to replace the “print” method with that of the markdown wrapping package (see example bellow)
  3. Compile the doc into .md using knitr
  4. Turn the .md into .docx using pandoc

Here is an example rmarkdown code for steps 1 and 2:

 
Doc header 1
============
```{r set_knitr_chunk_options}
opts_chunk$set(echo=FALSE,message=FALSE,results = "asis") # important for making sure the output will be well formatted.
```
 
```{r load_pander_methods}
require(pander)
replace.print.methods <- function(PKG_name = "pander") {
   PKG_methods <- as.character(methods(PKG_name))
   print_methods <- gsub(PKG_name, "print", PKG_methods)
   for(i in seq_along(PKG_methods)) {
      f <- eval(parse(text=paste(PKG_name,":::", PKG_methods[i], sep = ""))) # the new function to use for print
      assign(print_methods[i], f, ".GlobalEnv")
   }   
}
replace.print.methods()
## The following might work with some tweaks:
## print <- function (x, ...) UseMethod("pander")
```
Some text explaining the analysis we are doing
```{r}
summary(cars)# a summary table
fit <- lm(dist~speed, data = cars)
fit
plot(cars) # a plot
```

The above code can be saved into an .rmd file, for example: example.rmd
This file can now be compiled using knitr:

library(knitr)
knit2html("example.rmd")

This will produce an example.md file, which can be compiled into a Word file using pandoc.
If you don’t yet have pandoc, and are running a Windows OS, you can quickly install pandoc by running the following code in R:

# installing/loading the package:
if(!require(installr)) { install.packages("installr"); require(installr)} #load / install+load installr 
# Installing pandoc
install.pandoc(use_regex = FALSE)

Once pandoc is installed, simply run:

FILE <- "example"
system(paste0("pandoc -o ", FILE, ".docx ", FILE, ".md"))

And your .docx file is ready!

Possible expansions and caveats

The first caveat of this method is that it relies on markdown and pander, which is (by definition) more limited than using something like LaTeX. For that purpose, one can decide to work with LaTeX based solutions. Here is an example of how to do it with several existing packages (this code bellow is not very debugged – so more careful attention should be given to using it – I welcome comments and suggestions):

 
```{r load_pander_methods}
replace.print.methods <- function(PKG_name = "pander") {
   PKG_methods <- as.character(methods(PKG_name))
   print_methods <- gsub(PKG_name, "print", PKG_methods)
   for(i in seq_along(PKG_methods)) {
      f <- eval(parse(text=paste(PKG_name,":::", PKG_methods[i], sep = ""))) # the new function to use for print
      assign(print_methods[i], f, ".GlobalEnv")
   }   
}
require(xtable)
replace.print.methods("xtable")
```

Similar solutions can probably be found for HTML documents also. (credit: The above code is based on the help of Ramnath to my question on SO)

The second caveat is that the above solution (at least the part that makes sure we can use the R code as is, without wrapping it with things like “pander(summary(cars))”), is basically a dirty hack. It is a hack in the sense that it overrides basic R commands (which is quite ugly really). This issue is being thought about and discussed for over a month now in the knitr github page, I hope a better solution will come out of it.

The third issue is that if you use a function for which there is an issue with the method, it might cause problems in compiling the code (for example, pander still needs a pander.summary.lm method…).

To conclude: Thanks to the amazing work by Yihui on knitr, by the people at RStudio, by Jeffrey Horner on markdown, Gergely Daróczi for pander, and many others – it is now easier than ever to quickly create a docx report based on analysis performed using R. It seems that 2012 was a great year for reproducible research, I’m looking forward to 2013…

To leave a comment for the author, please follow the link and comment on their blog: R-statistics blog » RR-statistics blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)