Linguistic Notation Inside of R Plots!

April 14, 2012
By

(This article was first published on Val Systems, and kindly contributed to R-bloggers)

So, I've been playing around with learning knitr, which is a Sweave-like R package for combining LaTeX and R code into one document. There's almost no learning curve if you already use Sweave, and I find a lot of knitr's design and usage to be a lot nicer.

I wasn't going to make a blog post or tutorial about knitr, because the documentation is already pretty good, and contains a lot of tutorials.  However, I've just had a major victory in incorporating linguistic notations into plots using knitr, and I just had to share. I'll show you the payoff first, and then include the details.

First, I managed to successfully use IPA characters as plot symbols and legend keys.
The actual data in the plot is on car fuel economy, but that's not the point. Look at that IPA!

Then, I tried to expand on the principles that got me the IPA, and look what I produced.
Yes, that is a syntax tree overlaid on top of the plot. But why stop there when you could go completely crazy?

How to do it.

The important thing about making these plots is that they were easy given my pre-existing knowledge of R, LaTeX and what I've learned about knitr.  The crucial element here is that knitr supports tikz graphics. I don't know anything about tikz graphics, and I still don't, which means that if you don't know anything about tikz graphics, you can still make plots like these.

Like most linguists who use LaTeX, I already know how to include IPA characters and draw syntactic trees in a LaTeX document. It's simple as
...
\usepackage{tipa}
\usepackage{qtree}
...
\textipa{D C P}
\Tree [.S NP VP ]
...

What is so cool about the tikz device is that it lets you define these notations in LaTeX syntax, and then incorporates them into R graphs. Here are the important code chunks to include in your knitr document to make it all work.

1 — Load the right R packages

Early on, load the ggplot2 and tikzDevice R packages.

2 — Define your LaTeX libraries

Then, you need to tell the tikz device which LaTeX packages you want to use.
<<>>=
options(tikzLatexPackages = c(getOption("tikzLatexPackages"),
"\\usepackage{tipa}",
"\\usepackage{qtree}"))
@

3 — Define the plotting elements in LaTeX

We're done with the hard part. Now, it's as simple as faking up some data...
<<>>=
levels(mpg$drv) <- c("\\textipa{D}",
"\\textipa{C}",
"\\textipa{P}")
 
mpg$tree <- "{\\footnotesize \\Tree [.S NP VP ]}"
@

4 — Plot the data using the tikz device

...and plotting it, using the tikz device.
<<dev="tikz", fig.width=8, fig.height=5, out.width="0.9\\textwidth", fig.align="center">>=
ggplot(mpg, aes(displ, hwy, label = drv, color = drv)) +
geom_text() +
stat_smooth()+
xlab("\\textipa{IPA!}")
@
Or, in the case of the syntactic trees,
<<dev="tikz", fig.width=8, fig.height=5, out.width="0.7\\textwidth", fig.align="center">>=
ggplot(mpg, aes(displ, hwy, label = tree))+
geom_text() +
stat_smooth()+
xlab("TREES")
@

5 — Compile the .Rnw to a .tex document

Here's some source code to embed these plots in a beamer presentation. To compile a .tex document from the .Rnw source, you can run
library(knitr)
knit("./ling-plot.Rnw")
Then, just compile the .tex document however your little heart desires.

How to do it with one click

As if this weren't awesome and easy enough yet, it's possible to compile the whole document in one click using RStudio, as outlined on this knitr page. You'll need to download the development (i.e. not guaranteed to be stable) RStudio release, then set the compilation option to use knitr, and you're done!

I have to say that from  a practical standpoint, I've found writing Sweave documents in RStudio to be a much better experience than what I was doing before, because I can run and debug the R code from within the .Rnw source document. No need to go flipping back and forth between a Tex editor and R.

P.S. I highlighted the code above at http://www.inside-r.org/pretty-r

To leave a comment for the author, please follow the link and comment on his blog: Val Systems.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.