Integrate data and reporting on the Web with knitr

September 11, 2012
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Today's guest post comes from Yihui Xie, author of the knitr package — ed.

Hi, this is Yihui Xie, and I'm guest posting on the Revolutions blog to talk about one aspect of the knitr package: how we can integrate data analysis and reporting in R with the Web. This post includes both the work that has been done and the ongoing work. For those who have no idea of knitr, it is an R package to generate reports dynamically from the mixture of computer code and narratives. It is available on CRAN and Github.

Why do I put such special emphasis on the web? Let me quote what I heard from Carlos Scheidegger in this summer when I was doing intern at AT&T Labs: one thing does not exist unless it is on the web (one corollary to his “theorem”, to my understanding, was “if you do not have a homepage yet, you simply do not exist”). Well, I web, therefore I am.

I wrote a blog post on enjoyable reproducible research back in June; for those who come from the LaTeX world, let me repeat the idea: reproducible research does not have to begin with \documentclass{}, and there can be more fun with the web. LaTeX is mainly for printing purposes in my eyes, so if you are not going to print the stuff that you are reading (like what you are doing now), you may start playing with R in the web now.

Learn knitr in 5 minutes

Here is a video that demonstrates how to use knitr to create HTML pages from Markdown and PDF from Rnw documents respectively.

 

The examples that I used were from https://github.com/yihui/knitr-examples. The reference card mentioned in the video can be downloaded here. Next I'm going to introduce a few interesting projects based on knitr.

RPubs

RPubs publishes HTML reports compiled by knitr from R Markdown documents. This website is hosted by RStudio, and it really shows the creativity of users when you give them a simple tool. It has been mentioned once in this blog, and let me show you a few more significant uses (list not exhaustive, of course):

Within three months, RPubs has got more than 600 contributed reports, demonstrating many exciting applications of R in the web.

OpenCPU and CRUNCH

The web feature of knitr makes it easy to generate reports in the cloud, so that the client only needs a web browser. I introduce two platforms here, both of which support knitr:

  1. OpenCPU by Jeroen Ooms: see a knitr app; OpenCPU features the REST API, so you can deploy your apps anywhere;
  2. CRUNCH by the Knowledge Media Institute of the Open University: see examples by me and Fridolin; CRUNCH features an RStudio workspace and services to run computationally-intense learning analytics;

The R package markdown is used to convert the markdown output from knitr to HTML, thanks to the contribution of Jeffrey Horner.

R notebook

As I mentioned in the beginning, I was working at AT&T Labs in the summer, and one thing that we were trying to do was to build an R notebook, which is similar to the Python notebook. The source code is available on Github now, but it might take you a while to figure out how to run it. Anyway, the idea is still to develop an environment on the web for people to write reports and collaborate with each other.

R package documentation and vignettes

R documentation could have been much more attractive. Look at the websites of other languages like Python, Ruby, … What you can say is, sigh. R is so strong at graphics, whereas standard R documentation is almost always plain text, even when you look at ?plot.

UCLA R tutorials have included examples of Singular Value Decomposition (built with knitr), and I think ?svd will be more enjoyable to read if the documentation looks like that. Many R users know Hadley's ggplot2 documentation website, and the best thing in my eyes about this website is I can learn by reading both the source code and output without having to copy the code, open R, paste the code, and check the results.

In this summer, knitr was accepted to the Google Summer of Code (GSoC) and I had a student (Taiyun Wei) who worked on a few interesting directions. Two of them were related to R documentation:

  1. Convert standard R package documentation to HTML pages, run the code in Examples sections and embed results there; for example, you can call library(knitr); knit_rd('maps') to generate HTML help pages for all objects in the maps package, and you will see maps in the help pages;
  2. Write package vignettes with R Markdown instead of LaTeX/Sweave; check out Taiyun's corrplot package on Github, run R CMD build/R CMD INSTALL on it, and you will see a nice HTML vignette in help.start(); the key is in the Makefile and index.Rmd (more technical details later in my own blog);

Hopefully this can encourage R package authors to write more attractive documentation, since there is no excuse not to write more examples, or avoid package vignettes.

Vistat

Vistat is also a project that we started in GSoC. The motivation was simple: we should be able to reproduce graphs in which we are interested. We reproduce them for the sake of either verification of reproducibility or merely learning purposes. All the examples in this website are generated from knitr (including the animations), and you can always read the R Markdown source if the R code in the pages is not enough for you to figure out how to make the plots.

Another purpose is to see if we can build a really fast journal via GIT since it is hosted on Github with Jekyll. Vistat is still in its early stage, and I will see how it moves forward by the help of the community.

Conclusion

With the simple idea of mixing code into narratives, we can do a lot of interesting things. Go claim your existence by publishing a few web pages!

Yihui Xie is a fourth year PhD student in the Department of Statistics, Iowa State University. He has been using R for 8 years and developed a number of R packages including knitr, animation, formatR, Rd2roxygen and fun, etc.

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.