Saturday, April 14, 2012

Linguistic Notation Inside of R Plots!

So, I've been playing around with learning knitr, which is a Sweave-like R package for combining LaTeX and R code into one document. There's almost no learning curve if you already use Sweave, and I find a lot of knitr's design and usage to be a lot nicer.

I wasn't going to make a blog post or tutorial about knitr, because the documentation is already pretty good, and contains a lot of tutorials.  However, I've just had a major victory in incorporating linguistic notations into plots using knitr, and I just had to share. I'll show you the payoff first, and then include the details.

First, I managed to successfully use IPA characters as plot symbols and legend keys.
The actual data in the plot is on car fuel economy, but that's not the point. Look at that IPA!

Then, I tried to expand on the principles that got me the IPA, and look what I produced.
Yes, that is a syntax tree overlaid on top of the plot. But why stop there when you could go completely crazy?

How to do it.

The important thing about making these plots is that they were easy given my pre-existing knowledge of R, LaTeX and what I've learned about knitr.  The crucial element here is that knitr supports tikz graphics. I don't know anything about tikz graphics, and I still don't, which means that if you don't know anything about tikz graphics, you can still make plots like these.

Like most linguists who use LaTeX, I already know how to include IPA characters and draw syntactic trees in a LaTeX document. It's simple as
...
\usepackage{tipa}
\usepackage{qtree}
...
\textipa{D C P}
\Tree [.S NP VP ]
...

What is so cool about the tikz device is that it lets you define these notations in LaTeX syntax, and then incorporates them into R graphs. Here are the important code chunks to include in your knitr document to make it all work.

1 — Load the right R packages

Early on, load the ggplot2 and tikzDevice R packages.

2 — Define your LaTeX libraries

Then, you need to tell the tikz device which LaTeX packages you want to use.
<<>>=
    options(tikzLatexPackages = c(getOption("tikzLatexPackages"),
                                  "\\usepackage{tipa}",
                                  "\\usepackage{qtree}"))
@

3 — Define the plotting elements in LaTeX

We're done with the hard part. Now, it's as simple as faking up some data...
<<>>=
    levels(mpg$drv) <- c("\\textipa{D}",
                         "\\textipa{C}",
                         "\\textipa{P}")
 
    mpg$tree <- "{\\footnotesize \\Tree [.S NP VP ]}"
@

4 — Plot the data using the tikz device

...and plotting it, using the tikz device.
<<dev="tikz", fig.width=8, fig.height=5, out.width="0.9\\textwidth", fig.align="center">>=
    ggplot(mpg, aes(displ, hwy, label = drv, color = drv)) + 
            geom_text() + 
            stat_smooth()+
            xlab("\\textipa{IPA!}")    
@
Or, in the case of the syntactic trees,
<<dev="tikz", fig.width=8, fig.height=5, out.width="0.7\\textwidth", fig.align="center">>=
    ggplot(mpg, aes(displ, hwy, label = tree))+
            geom_text() + 
            stat_smooth()+
            xlab("TREES")
@

5 — Compile the .Rnw to a .tex document

Here's some source code to embed these plots in a beamer presentation. To compile a .tex document from the .Rnw source, you can run
library(knitr)
knit("./ling-plot.Rnw")
Then, just compile the .tex document however your little heart desires.

How to do it with one click

As if this weren't awesome and easy enough yet, it's possible to compile the whole document in one click using RStudio, as outlined on this knitr page. You'll need to download the development (i.e. not guaranteed to be stable) RStudio release, then set the compilation option to use knitr, and you're done!

I have to say that from  a practical standpoint, I've found writing Sweave documents in RStudio to be a much better experience than what I was doing before, because I can run and debug the R code from within the .Rnw source document. No need to go flipping back and forth between a Tex editor and R.

P.S. I highlighted the code above at http://www.inside-r.org/pretty-r

3 comments:

  1. IPA characters have never been too difficult to plot using R, as long as you use unicode hexadecimals, e.g. \u0292. I happen to use IPA Palette, which will show you the unicode for a character if you just hover your cursor over it. This works great for being able to quickly get characters to display correctly.

    The one issue though, and perhaps you resolved this by importing TIPA into R, is that certain IPA unicode characters are not specified for R's default font. I often switch into 'mono' in R, which can display all unicode characters, but unfortunately looks like Courier.

    ReplyDelete
  2. Yeah, I think I played around with using unicode once or twice, but I think precisely the one symbol I wanted (maybe wedge?) wouldn't come out.

    I think I much prefer the tikz approach, because now I don't need to mess around with finding the unicode hex values. It's just ordinary tipa.

    Also, when using the tikz device, it'll render all text in the figure as if it were any other text in the the same LaTeX document, meaning your fonts will match between the document and the figures.

    ReplyDelete
  3. It sounds like this is the method to try then. The font non-uniformity issue is annoying when using R. And after having used the mono font in R for my figures with IPA, I had to argue with a copyeditor who didn't want a Courier-style font in my paper. It would be nice if everything looked Times-like and was consistent.

    ReplyDelete