by Rodney Sparapani, PhD
Rodney is an Assistant Professor in the Institute for Health and Society from the Division of Biostatistics at the Medical College of Wisconsin in Milwaukee and president of the Milwaukee Chapter of the ASA which is hosting an R workshop on Data Mining in Milwaukee on April 4th.
Emacs Speaks Statistics (ESS) is a GPL software package for GNU Emacs that provides support for several statistical programming languages. This post which focuses on ESS and R, provides some history on both Emacs and ESS, provides some guidance on installing both environments and the basics of how to get started.
Brief History of Emacs
Emacs has a long and important history as a programmer's editor. It was created in 1976 by Richard Stallman AKA RMS. In 1984, RMS released GNU Emacs; the first free software program released by the GNU Project. Originally, Emacs provided intelligent editing for popular programming languages of the time such as Pascal, PL/I and Fortran; each language was supported by a corresponding "major mode" which we will call a mode for short. Now Emacs has modes for the popular programming languages of today such as R (via ESS), C/C++, Java, Perl and Python. Modes are the killer app of Emacs. You can learn one editor, Emacs, which provides an IDE for practically all of the programming languages you are likely to ever need. Emacs also supports a wide variety of markup languages like LaTeX (via AUCTeX) and HTML.
You can find the source code for GNU Emacs online. I highly recommend the latest stable release, v24.3, for its feature richness and stability. MS Windows and Mac OS X users can find binaries online (which come with ESS and AUCTeX included) at Vincent Goulet's web page. If you are using Linux or UNIX, then you may be able to find binaries for Emacs from a repo associated with your distribution. If not, then you can install Emacs from source. However, beware, Emacs has a lot of dependencies; an abbreviated list includes giflib, libpng, libtiff, ispell/aspell, libXaw and ncurses. By default, configure assumes that the install location is /usr/local, but you can override that with the --prefix option:
configure --prefix=/opt/local --with-x-toolkit=lucid --without-gconf
These options should work on a wide variety of Linux and UNIX distributions.
Dissecting an Emacs Frame
The Emacs window is called a Frame; we will dissect the Frame from top to bottom. In Figure 0, you can see that the Menu is at the top.
Just below that is the Toolbar with icons for common operations. Next, we come to the buffer area where the file you are editing appears. Below that is an information strip called the mode line. From left to right, the mode line has several items which you can hover over to receive tips on what they represent. In Figure 0, you will see that at the beginning of the mode line, there are 5 characters which each represent file information: the coding system, the end-of-line character, writable or read-only, whether the buffer has been modified and the current directory respectively; all but the last can be modified by clicking on the corresponding character.
Next is the file name. The mode name will follow and be in parentheses. And, finally, at the bottom is the minibuffer which we will see more of.
Common Operations and Modifier Keys
Besides modes, Emacs is known for its commands bound to key sequences. You can perform a lot of operations from the Menu and the Toolbar that are self-explanatory. However, due to the constant mouse movements you may find these inconvenient; key combinations exist for many common operations.
In the Emacs help notation, C-KEY means hold down the Control key while pressing another KEY. For example, C-h means hold down Control while pressing h. For new Emacs users, C-h is very helpful. C-h is the help key; note that C-h is also assigned to F1 for convenience. The key sequence C-h t or F1 t will launch the Emacs tutorial. C-h k runs the command describe-key. After pressing C-h k you will see the following prompt in the minibuffer "Describe key (or click or menu item):" which will wait until you press a key sequence (or click or pick a menu item). For example, entering C-g after the prompt produces:
C-g runs the command keyboard-quit, which is an interactive compiled Lisp function in `simple.el'.
It is bound to C-g.
Signal a `quit' condition. During execution of Lisp code, this character causes a quit directly. At top-level, as an editor command, this simply beeps.
It is this help at your finger tips which is the self-documenting feature of Emacs. Remember C-g, it can be used to cancel a command in progress if you change your mind or you launched the command in error. A few other useful commands relate to splitting the current buffer; C-x 2 will split the current buffer in half above and below. C-x 1 will return it to one buffer. Similarly, C-x 3 splits the current buffer left and right and C-x 1 will restore it.
M-KEY means hold down the Meta key while pressing another KEY. On PC (Mac) keyboards, the Meta key is usually the Alt (Option) key. On UNIX keyboards, Meta keys are usually to the left and right of the spacebar and have a solid diamond symbol. To be sure, use describe-key, i.e. C-h k M-x Of course, you will not be sure which key is the Meta key, but you will quickly find out. If you don't have a Meta key for some reason, you can press and release the Escape key and then press KEY. You can execute an emacs command by name as follows: M-x COMMAND Enter. For example, to run describe-key: M-x describe-key Enter.
Brief History of ESS
In the late 1990s, Anthony Rossini lead the effort to merge S-mode (developed by David Smith, editor of this blog), SAS-mode and Stata-mode into one package: Emacs Speaks Statistics (ESS). Originally, ESS supported GNU Emacs and XEmacs. XEmacs was a popular fork of Emacs at that time, but the feature set of Emacs and XEmacs have diverged. Today, ESS only supports GNU Emacs; the current stable release is v13.09-1. However, XEmacs users can still use the slightly older version of ESS (circa 2012) v12.04-4. You can find every release of ESS from 2002 onward in the ESS archive here.
You can find the source of ESS online at the R Project. As already mentioned, you can install Emacs and ESS simultaneously with Vincent Goulet's binaries. You can get the current stable release as well as other releases from the ESS archive. Like all free software, ESS is a work in progress. Between releases, new features and bug fixes appear in the ESS repo. If you have a need to install the latest development release, then you can grab the source from one of the ESS repos. ESS has two repos; one based on subversion, AKA SVN, and the other based on git. Although, the SVN repo is the basis of releases, the two repos are synchronized regularly.
You can check out the latest development release from SVN via the command:
svn checkout https://svn.r-project.org/ESS/trunk /path/to/ESS
Replace /path/to/ESS with the directory on your local system where you want to store ESS. Or, similarly, via the git command:
git clone https://github.com/emacs-ess/ESS.git /path/to/ESS
The steps to install ESS can be found online. Please follow the steps carefully. Note that steps 2 and 3 are optional, but steps 4 and 5 are necessary.
ESS in action
If you have installed ESS (and re-launched Emacs), then you should be ready to go. In Emacs, type M-x ess-version Enter to see if Emacs is running the version of ESS that you installed. As of this writing, the latest released version is v13.09-1 while the latest development version in the repo is v13.09-2.
Now, let's take a look at an example from the Modern Applied Statistics with S (MASS) book.
Type: C-x C-f galaxies.R Into this new file, copy and paste:
galaxies <- galaxies/1000
c(width.SJ(galaxies, method = "dpi"), width.SJ(galaxies))
plot(x = c(0, 40), y = c(0, 0.3), type = "n", bty = "l",
xlab = "velocity of galaxy (km/s)", ylab = "density")
lines(density(galaxies, width = 3.25, n = 200), lty = 1)
lines(density(galaxies, width = 2.56, n = 200), lty = 3)
With the .R extension, this file will be recognized as an R program. On the mode line, you will see the mode name: "ESS[S] [R db -] ElDoc". Since ESS was derived from S-mode (and R from S), the mode name starts with ESS[S]. The "R" in [R db -] represents the R language. The "db -" stands for ess-tracebug which provides visual debugging, breakpoints, tracing, etc. For more on ess-tracebug, see its documentation. And, finally, "ElDoc" signifies that ElDoc is turned on. With ElDoc, the minibuffer displays function arguments at point. For example, place the cursor on the "x" in "plot(x" in galaxies.R buffer and you will see the arguments for the plot() function displayed in the minibuffer; we saw that in Figure 0.
The syntax highlighting for the R language provided by ESS is configurable. In Emacs, syntax highlighting is known as font-locking. You can customize the amount of syntax highlighting that you want to see. At the top of the Emacs window, click on the ESS menu and select "Font Lock". This will display a menu of buttons corresponding to language elements that you can syntax highlight. For example, in Figure 1, you can see that when you have turned off all font-locking, the only thing syntax highlighted are strings encased in double quotes.
At the other end of the spectrum, in Figure 2, you can see what it looks like when nearly all of the choices are picked.
You can experiment with the various settings and once you are satisfied, then press "Save to custom" at the bottom. This will save your settings in your Emacs initialization file ~/.emacs You will see them in a section that begins with "(custom-set-variables".
Now, let's return to our galaxies example. You can submit the whole buffer to an R process by pressing “C-c C-b”. If you don't have an R process running in your Emacs session, then one will be created for you in a buffer entitled "*R*" which you will see appear as your buffer is split either above/below or left-right. You can also submit a region by highlighting some code and pressing “C-c C-r”. You can submit a paragraph in which your cursor resides by C-c C-p (a paragraph is a set of one or more lines of codes separated by blank lines). You can submit the line on which your cursor resides by C-c C-j (your cursor can be anywhere in the line; it doesn't have to be at the beginning or the end).
Now, in the *R* buffer, at an R prompt, type ?galaxies. If you press "n", then you will move to the next section of the help buffer; press "n" until you get to Examples. There you will find something similar to the example up above. However, in Figure 3, notice that the R syntax is not highlighted.
ESS and polymode
When you are using R, you may find yourself editing R code that has embedded C/C++, HTML or LaTeX. Or you may simply be reading a help page. Emacs, generally, has one major mode per buffer. So, the syntax highlighting will not be what the user intended. polymode was developed as a helper mode for ESS to fix this. With polymode, R code in the help pages, as well as embedded code from another language, is syntax highlighted correctly. Look here to get polymode source code and installation tips online.
So, let's to return to our galaxies example. In Figure 4, you can see that the R code in the Examples section is now syntax highlighted via polymode.
Emacs and ESS Zombies
Welcome to the Emacs and ESS world! Hopefully, this article has inspired you to give it a try. Like all software, Emacs and ESS are not perfect. However, their track record show that they have served R users well with an intelligent editing environment. To find more more about ESS look here.
The ESS documentation is a work in progress. However, to be a true zombie, don't be too squeamish to RTFM. For zombies that hunger for historical Emacs brain matter, I recommend EMACS: the extensible, customizable, self-documenting display editor by Richard M. Stallman. And for a wonderful introduction to S and R, read Modern Applied Statistics with S, 4th ed. by Bill Venables and Brian Ripley.
Finally, if you would like to talk more about Emacs, ESS or Zombies, stop by the R Workshop on Classical and Bayesian Data Mining sponsored by the Milwaukee Chapter of the ASA on April 4.