Brief introduction on Sweave and Knitr for reproducible research

February 24, 2014
By

(This article was first published on R is my friend » R, and kindly contributed to R-bloggers)

A few weeks ago I gave a presentation on using Sweave and Knitr under the guise of promoting reproducible research. I humbly offer this presentation to the blog with full knowledge that there are already loads of tutorials available online. This presentation is \LaTeX specific and slightly biased towards Windows OS, so it probably has limited relevance if you are interested in other methods. Anyhow, I hope this is useful to some of you.

Cheers,

Marcus

\documentclass[xcolor=svgnames]{beamer}
%\documentclass[xcolor=svgnames,handout]{beamer}
\usetheme{Boadilla}
\usecolortheme[named=Sienna]{structure}
\usepackage{graphicx}
\usepackage[final]{animate}
%\usepackage[colorlinks=true,urlcolor=blue,citecolor=blue,linkcolor=blue]{hyperref}
\usepackage{breqn}
\usepackage{xcolor}
\usepackage{booktabs}
\usepackage{verbatim}
\usepackage{tikz}
\usetikzlibrary{shadows,arrows,positioning}
\usepackage[noae]{Sweave}
\definecolor{links}{HTML}{2A1B81}
\hypersetup{colorlinks,linkcolor=links,urlcolor=links}
\usepackage{pgfpages}
%\pgfpagesuselayout{4 on 1}[letterpaper, border shrink = 5mm, landscape]

\tikzstyle{block} = [rectangle, draw, text width=7em, text centered, rounded corners, minimum height=3em, minimum width=7em, top color = white, bottom color=brown!30,  drop shadow]

\newcommand{\ShowSexpr}[1]{\texttt{{\char`\\}Sexpr\{#1\}}}

\begin{document}
\SweaveOpts{concordance=TRUE}

\title[Nuts and bolts of Sweave/Knitr]{The nuts and bolts of Sweave/Knitr for reproducible research with \LaTeX}
\author[M. Beck]{Marcus W. Beck}

\institute[USEPA NHEERL]{ORISE Post-doc Fellow\\
USEPA NHEERL Gulf Ecology Division, Gulf Breeze, FL\\
Email: \href{mailto:[email protected]}{[email protected]}, Phone: 850 934 2480}

\date{January 15, 2014}

%%%%%%
\begin{frame}
\vspace{-0.3in}
\titlepage
\end{frame}

%%%%%%
\begin{frame}{Reproducible research}
\onslide<+->
In it's most general sense... the ability to reproduce results from an experiment or analysis conducted by another.\\~\\
\onslide<+->
From Wikipedia... `The ultimate product is the \alert{paper along with the full computational environment} used to produce the results in the paper such as the code, data, etc. that can be \alert{used to reproduce the results and create new work} based on the research.'\\~\\
\onslide<+->
Concept is strongly based on the idea of \alert{literate programming} such that the logic of the analysis is clearly represented in the final product by combining computer code/programs with ordinary human language [Knuth, 1992].
\end{frame}

%%%%%%
\begin{frame}{Non-reproducible research}
\begin{center}
\begin{tikzpicture}[node distance=2.5cm, auto, >=stealth]
	\onslide<2->{
	\node[block] (a) {1. Gather data};}
	\onslide<3->{
	\node[block] (b)  [right of=a, node distance=4.2cm] {2. Analyze data};
 	\draw[->] (a) -- (b);}
 	\onslide<4->{
 	\node[block] (c)  [right of=b, node distance=4.2cm]  {3. Report results};
 	\draw[->] (b) -- (c);}
%  	\onslide<5->{
%  	\node [right of=a, node distance=2.1cm] {\textcolor[rgb]{1,0,0}{X}};
%  	\node [right of=b, node distance=2.1cm] {\textcolor[rgb]{1,0,0}{X}};}
\end{tikzpicture}
\end{center}
\vspace{-0.5cm}
\begin{columns}[t]
\onslide<2->{
\begin{column}{0.33\textwidth}
\begin{itemize}
\item Begins with general question or research objectives
\item Data collected in raw format (hard copy) converted to digital (Excel spreadsheet)
\end{itemize}
\end{column}}
\onslide<3->{
\begin{column}{0.33\textwidth}
\begin{itemize}
\item Import data into stats program or analyze directly in Excel
\item Create figures/tables directly in stats program
\item Save relevant output
\end{itemize}
\end{column}}
\onslide<4->{
\begin{column}{0.33\textwidth}
\begin{itemize}
\item Create research report using Word or other software
\item Manually insert results into report
\item Change final report by hand if methods/analysis altered
\end{itemize}
\end{column}}
\end{columns}

\end{frame}

%%%%%%
\begin{frame}{Reproducible research}
\begin{center}
\begin{tikzpicture}[node distance=2.5cm, auto, >=stealth]
	\onslide<1->{
	\node[block] (a) {1. Gather data};}
	\onslide<1->{
	\node[block] (b)  [right of=a, node distance=4.2cm] {2. Analyze data};
 	\draw[<->] (a) -- (b);}
 	\onslide<1->{
 	\node[block] (c)  [right of=b, node distance=4.2cm]  {3. Report results};
 	\draw[<->] (b) -- (c);}
\end{tikzpicture}
\end{center}
\vspace{-0.5cm}
\begin{columns}[t]
\onslide<1->{
\begin{column}{0.33\textwidth}
\begin{itemize}
\item Begins with general question or research objectives
\item Data collected in raw format (hard copy) converted to digital (\alert{text file})
\end{itemize}
\end{column}}
\onslide<1->{
\begin{column}{0.33\textwidth}
\begin{itemize}
\item Create \alert{integrated script} for importing data (data path is known) 
\item Create figures/tables directly in stats program
\item \alert{No need to export} (reproduced on the fly)
\end{itemize}
\end{column}}
\onslide<1->{
\begin{column}{0.33\textwidth}
\begin{itemize}
\item Create research report using RR software
\item \alert{Automatically include results} into report
\item \alert{Change final report automatically} if methods/analysis altered
\end{itemize}
\end{column}}
\end{columns}

\end{frame}

%%%%%%
\begin{frame}{Reproducible research in R}
Easily adopted using RStudio [\href{http://www.rstudio.com/}{http://www.rstudio.com/}]\\~\\
Also possible w/ Tinn-R or via command prompt but not as intuitive\\~\\
Requires a \LaTeX\ distribution system - use MikTex for Windows [\href{http://miktex.org/}{http://miktex.org/}]\\~\\
\onslide<2->{
Essentially a \LaTeX\ document that incorporates R code... \\~\\
Uses Sweave (or Knitr) to convert .Rnw file to .tex file, then \LaTeX\ to create pdf\\~\\
Sweave comes with \texttt{utils} package, may have to tell R where it is \\~\\
}
\end{frame}

%%%%%%
\begin{frame}{Reproducible research in R}
Use same procedure for compiling a \LaTeX\ document with one additional step

\begin{center}
\begin{tikzpicture}[node distance=2.5cm, auto, >=stealth]
	\onslide<2->{
	\node[block] (a) {1. myfile.Rnw};}
	\onslide<3->{
	\node[block] (b)  [right of=a, node distance=4.2cm] {2. myfile.tex};
 	\draw[->] (a) -- (b);\node [right of=a, above=0.5cm, node distance=2.1cm] {Sweave};}
 	\onslide<4->{
 	\node[block] (c)  [right of=b, node distance=4.2cm]  {3. myfile.pdf};
 	\draw[->] (b) -- (c);
 	\node [right of=b, above=0.5cm, node distance=2.1cm] {pdfLatex};}
\end{tikzpicture}
\end{center}
\vspace{-0.5cm}
\begin{columns}[t]
\onslide<2->{
\begin{column}{0.33\textwidth}
\begin{itemize}
\item A .tex file but with .Rnw extension
\item Includes R code as `chunks' or inline expressions
\end{itemize}
\end{column}}
\onslide<3->{
\begin{column}{0.33\textwidth}
\begin{itemize}
\item .Rnw file is converted to a .tex file using Sweave
\item .tex file contains output from R, no raw R code
\end{itemize}
\end{column}}
\onslide<4->{
\begin{column}{0.33\textwidth}
\begin{itemize}
\item .tex file converted to pdf (or other output) for final format
\item Include biblio with bibtex
\end{itemize}
\end{column}}
\end{columns}

\end{frame}

%%%%%%
\begin{frame}[containsverbatim]{Reproducible research in R} \label{sweaveref}
\begin{block}{.Rnw file}
\begin{verbatim}
\documentclass{article}
\usepackage{Sweave}

\begin{document}

Here's some R code:

\Sexpr{'<<eval=true,echo=true>>='}
options(width=60)
set.seed(2)
rnorm(10)
\Sexpr{'@'}

\end{document}
\end{verbatim}
\end{block}

\end{frame}

%%%%%%
\begin{frame}[containsverbatim,shrink]{Reproducible research in R}
\begin{block}{.tex file}
\begin{verbatim}
\documentclass{article}
\usepackage{Sweave}

\begin{document}

Here's some R code:

\begin{Schunk}
\begin{Sinput}
> options(width=60)
> set.seed(2)
> rnorm(10)
\end{Sinput}
\begin{Soutput}
 [1] -0.89691455  0.18484918  1.58784533 -1.13037567  
 [5] -0.08025176  0.13242028  0.70795473 -0.23969802  
 [9]  1.98447394 -0.13878701
\end{Soutput}
\end{Schunk}

\end{document}
\end{verbatim}
\end{block}

\end{frame}

%%%%%%
\begin{frame}{Reproducible research in R}
The final product:\\~\\
\centerline{\includegraphics{ex1_input.pdf}}
\end{frame}

%%%%%%
\begin{frame}[fragile]{Sweave - code chunks}
\onslide<+->
R code is entered in the \LaTeX\ document using `code chunks'
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<>>='}
\Sexpr{'@'}
\end{verbatim}
\end{block}
Any text within the code chunk is interpreted as R code\\~\\
Arguments for the code chunk are entered within \verb|\Sexpr{'<<here>>'}|\\~\\
\onslide<+->
\begin{itemize}
\item{\texttt{eval}: evaluate code, default \texttt{T}}
\item{\texttt{echo}: return source code, default \texttt{T}}
\item{\texttt{results}: format of output (chr string), default is `include' (also `tex' for tables or `hide' to suppress)}
\item{\texttt{fig}: for creating figures, default \texttt{F}}
\end{itemize}
\end{frame}

%%%%%%
\begin{frame}[fragile]{Sweave - code chunks}
Changing the default arguments for the code chunk:
\begin{columns}[t]
\begin{column}{0.45\textwidth}
\onslide<+->
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<>>='}
2+2
\Sexpr{'@'}
\end{verbatim}
\end{block}
<<>>=
2+2
@
\onslide<+->
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<eval=F>>='}
2+2
\Sexpr{'@'}
\end{verbatim}
\end{block}
Returns nothing...
\end{column}
\begin{column}{0.45\textwidth}
\onslide<+->
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<results=hide>>='}
2+2
\Sexpr{'@'}
\end{verbatim}
\end{block}
<<results=hide>>=
2+2
@
\onslide<+->
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<echo=F>>='}
2+2
\Sexpr{'@'}
\end{verbatim}
\end{block}
<<echo=F>>=
2+2
@
\end{column}
\end{columns}
\end{frame}

%%%%%%
\begin{frame}[t,fragile]{Sweave - figures}
\onslide<1->
Sweave makes it easy to include figures in your document
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<myfig,fig=T,echo=F,include=T,height=3>>='}
set.seed(2)
hist(rnorm(100))
\Sexpr{'@'}
\end{verbatim}
\end{block}
\onslide<2->
<<myfig,fig=T,echo=F,include=T,height=3>>=
set.seed(2)
hist(rnorm(100))
@
\end{frame}

%%%%%%
\begin{frame}[t,fragile]{Sweave - figures}
Sweave makes it easy to include figures in your document
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<myfig,fig=T,echo=F,include=T,height=3>>='}
set.seed(2)
hist(rnorm(100))
\Sexpr{'@'}
\end{verbatim}
\end{block}
\vspace{\baselineskip}
Relevant code options for figures:
\begin{itemize}
\item{The chunk name is used to name the figure, myfile-myfig.pdf}
\item{\texttt{fig}: Lets R know the output is a figure}
\item{\texttt{echo}: Use \texttt{F} to suppress figure code}
\item{\texttt{include}: Should the figure be automatically include in output}
\item{\texttt{height}: (and \texttt{width}) Set dimensions of figure in inches}
\end{itemize}
\end{frame}

%%%%%%
\begin{frame}[t,fragile]{Sweave - figures}
An alternative approach for creating a figure
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<myfig,fig=T,echo=F,include=F,height=3>>='}
set.seed(2)
hist(rnorm(100))
\Sexpr{'@'}
\includegraphics{rnw_name-myfig.pdf}
\end{verbatim}
\end{block}
\includegraphics{Sweave_intro-myfig.pdf}
\end{frame}

%%%%%%
\begin{frame}[t,fragile]{Sweave - tables}
\onslide<1->
Really easy to create tables
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<results=tex,echo=F>>='}
library(stargazer)
data(iris)
stargazer(iris,title='Summary statistics for Iris data')
\Sexpr{'@'}
\end{verbatim}
\end{block}
\onslide<2->
<<results=tex,echo=F>>=
data(iris)
library(stargazer)
stargazer(iris,title='Summary statistics for Iris data')
@

\end{frame}

%%%%%%
\begin{frame}[t,fragile]{Sweave - tables}
Really easy to create tables
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<results=tex,echo=F>>='}
library(stargazer)
data(iris)
stargazer(iris,title='Summary statistics for Iris data')
\Sexpr{'@'}
\end{verbatim}
\end{block}
\vspace{\baselineskip}
\texttt{results} option should be set to `tex' (and \texttt{echo=F})\\~\\
Several packages are available to convert R output to \LaTeX\ table format
\begin{itemize}
\item{xtable: most general package}
\item{hmisc: similar to xtable but can handle specific R model objects}
\item{stargazer: fairly effortless conversion of R model objects to tables}
\end{itemize}
\end{frame}

%%%%%%
\begin{frame}[fragile]{Sweave - expressions}
\onslide<1->
All objects within a code chunk are saved in the workspace each time a document is compiled (unless \texttt{eval=F})\\~\\
This allows the information saved in the workspace to be reproduced in the final document as inline text, via \alert{expressions}\\~\\
\onslide<2->
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<echo=F>>='}
data(iris)
dat<-iris
\Sexpr{'@'}
\end{verbatim}
Mean sepal length was \ShowSexpr{mean(dat\$Sepal.Length)}.
\end{block}
\onslide<3->
<<echo=F>>=
data(iris)
dat<-iris
@
\vspace{\baselineskip}
Mean sepal length was \Sexpr{mean(dat$Sepal.Length)}.
\end{frame}

%%%%%%
\begin{frame}[fragile]{Sweave - expressions}
Change the global R options to change the default output\\~\\
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<echo=F>>='}
data(iris)
dat<-iris
options(digits=2)
\Sexpr{'@'}
\end{verbatim}
Mean sepal length was \ShowSexpr{format(mean(dat\$Sepal.Length))}.
\end{block}
<<echo=F>>=
data(iris)
dat<-iris
options(digits=2)
@
\vspace{\baselineskip}
Mean sepal length was \Sexpr{format(mean(dat$Sepal.Length))}.\\~\\
\end{frame}

%%%%%%
\begin{frame}{Sweave vs Knitr}
\onslide<1->
Does not automatically cache R data on compilation\\~\\
\alert{Knitr} is a useful alternative - similar to Sweave but with minor differences in args for code chunks, more flexible output\\~\\
\onslide<2->
\begin{columns}
\begin{column}{0.3\textwidth}
Must change default options in RStudio\\~\\
Knitr included with RStudio, otherwise download as package
\end{column}
\begin{column}{0.6\textwidth}
\centerline{\includegraphics[width=0.8\textwidth]{options_ex.png}}
\end{column}
\end{columns}
\end{frame}

%%%%%%
\begin{frame}[fragile]{Knitr}
\onslide<1->
Knitr can be used to cache code chunks\\~\\
Date are saved when chunk is first evaluated, skipped on future compilations unless changed\\~\\
This allows quicker compilation of documents that import lots of data\\
~\\
\begin{block}{}
\begin{verbatim}
\Sexpr{'<<mychunk, cache=TRUE, eval=FALSE>>='}
load(file='mydata.RData')
\Sexpr{'@'}
\end{verbatim}
\end{block}
\end{frame}

%%%%%%
\begin{frame}[containsverbatim,shrink]{Knitr} \label{knitref}
\begin{block}{.Rnw file}
\begin{verbatim}
\documentclass{article}

\Sexpr{'<<setup, include=FALSE, cache=FALSE>>='}
library(knitr)

#set global chunk options
opts_chunk$set(fig.path='H:/docs/figs/', fig.align='center', 
dev='pdf', dev.args=list(family='serif'), fig.pos='!ht')

options(width=60)
\Sexpr{'@'}

\begin{document}

Here's some R code:

\Sexpr{'<<eval=T, echo=T>>='}
set.seed(2)
rnorm(10)
\Sexpr{'@'}

\end{document}
\end{verbatim}
\end{block}

\end{frame}

%%%%%%
\begin{frame}{Knitr}
The final product:\\~\\
\centerline{\includegraphics[width=\textwidth]{knit_ex.pdf}}
\end{frame}

%%%%%%
\begin{frame}[containsverbatim,shrink]{Knitr}
Figures, tables, and expressions are largely the same as in Sweave\\~\\

\begin{block}{Figures}
\begin{verbatim}
\Sexpr{'<<myfig,echo=F>>='}
set.seed(2)
hist(rnorm(100))
\Sexpr{'@'}
\end{verbatim}
\end{block}
\vspace{\baselineskip}
\begin{block}{Tables}
\begin{verbatim}
\Sexpr{"<<mytable,results='asis',echo=F,message=F>>="}
library(stargazer)
data(iris)
stargazer(iris,title='Summary statistics for Iris data')
\Sexpr{'@'}
\end{verbatim}
\end{block}

\end{frame}

%%%%%%
\begin{frame}{A minimal working example}
\onslide<1->
Step by step guide to creating your first RR document\\~\\
\begin{enumerate}
\onslide<2->
\item Download and install \href{http://www.rstudio.com/}{RStudio}
\onslide<3->
\item Dowload and install \href{http://miktex.org/}{MikTeX} if using Windows
\onslide<4->
\item Create a unique folder for the document - This will be the working directory
\onslide<5->
\item Open a new Sweave file in RStudio
\onslide<6->
\item Copy and paste the file found on slide \ref{sweaveref} for Sweave or slide \ref{knitref} for Knitr into the new file (and select correct compile option)
\onslide<7->
\item Compile the pdf (runs Sweave/Knitr, then pdfLatex)\\~\\
\end{enumerate}
\onslide<7->
\centerline{\includegraphics[width=0.6\textwidth]{compile_ex.png}}
\end{frame}

%%%%%%
\begin{frame}{If things go wrong...}
\LaTeX\ Errors can be difficult to narrow down - check the log file\\~\\
Sweave/Knitr errors will be displayed on the console\\~\\
Other resources
\begin{itemize}
\item{`Reproducible Research with R and RStudio' by C. Garund, CRC Press}
\item{\LaTeX forum (like StackOverflow) \href{http://www.latex-community.org/forum/}{http://www.latex-community.org/forum/}}
\item Comprehensive Knitr guide \href{http://yihui.name/knitr/options}{http://yihui.name/knitr/options}
\item Sweave user manual \href{http://stat.ethz.ch/R-manual/R-devel/library/utils/doc/Sweave.pdf}{http://stat.ethz.ch/R-manual/R-devel/library/utils/doc/Sweave.pdf}
\item Intro to Sweave \href{http://www.math.ualberta.ca/~mlewis/links/the_joy_of_sweave_v1.pdf}{http://www.math.ualberta.ca/~mlewis/links/the_joy_of_sweave_v1.pdf}
\end{itemize}
\vspace{\baselineskip}
\end{frame}

\end{document}

To leave a comment for the author, please follow the link and comment on his blog: R is my friend » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.