Compiling R from source and why you shouldn’t do it

(This article was first published on On The Lambda, and kindly contributed to R-bloggers)

I’ve always thought that it’s silly, in most cases, source compiling software that’s already available in binary form. To the end of making more binary packages available to Mac users, I just started contributing to a project that is creating a repository of 64 bit builds of pkgsrc’s (NetBSD's portable package manager) over 12,000 packages. This means having to get my hands dirty compiling packages myself. After contributing Vim, the next logical thing for me is to provide a R build.

Compiling R from source (again and again) has been tremendously enlightening for me. Not only do I feel like I understand a lot more about R’s internals, but I’ve also come to the conclusion that if the CRAN provides a binary build for your system, you should never really compile R yourself. This, most definitely, includes Mac users.

Before I go into how to build it, let’s explore some of the reasons someone might want to build R themselves and why, in most cases, this is unnecessary.

  • I want a faster R.
  • It’s sometimes assumed that if you build something from source yourself, it’s customized to your particular system and, therefore, runs faster. In practice this requires a lot of intervention (and heartache) at the configuration step of the compilation process. In the case of R on OS X, no amount of compiler optimization and configuration (using the stock linear algebra libraries) I’ve attempted was able to outperform R from CRAN. You don’t know R better than the R Core Team, and they know what’s good for you. Just use theirs.

  • I can compile against other linear algebra libraries and get a speedup that way.
  • You don’t need to compile R against these other libraries in order to use them. I’ll go into how you can use them from your current R installation in another post.

  • I’m on a system for which there is no binary available.
  • Yikes! You’re probably used to heartache. You don’t have a choice than to build R yourself. Have a ball!

  • I just want to.
  • As I’ve discovered, it is a great way to learn more about R’s internals. If you fancy yourself an R ‘guru’ and want to build R yourself, I can’t really blame you—so long as you don’t use your likely botched build in a production environment.

  • I’m a gentoo user.
  • I’m so sorry.

  • I’m a Windows user and a masochist.
  • Compiling R is an excellent choice. The safe word is “GNU”.

  • I’m helping to build a repo of 64 bit binaries for pkgsrc or I’m writing a blog post about compiling R.
  • You’re exempt from criticism or ridicule.

If at this point, you’re still interested in compiling R, in spite of my attesting to it being, for most cases, completely unnecessary, please read on. I also strongly recommend that you read the following guide from CRAN.

Dependencies
Users of most GNU/Linux systems can build the dependencies necessary by running:

sudo apt-get build-dep r-base-dev

or the equivalent command for your system.

On OS X you need

  • Xcode and Xcode command-line tools: Xcode is available from the App Store. The command-line tools have to be downloaded separately from the ‘Preferences’ menu.
  • gfortran: or another compliant Fortran compiler. You need this to chiefly compile the linear algebra libraries.
  • Java: You can grab the Java for OS X developers package from the Apple Developers page or grab another JDK. You need this for the JNI headers.
  • XQuartz: This includes the X11 headers and cairo.
  • MacTex: This isn’t strictly necessary but you will need it to generate R’s PDF documentation. If you don’t want to download this over 2 GB package, there are other recourses available. If you want this package, you have to add "/usr/texbin" to your PATH environment variable. Yay, now you have LaTeX!

Other dependencies are unnecessary because the R source ships with fallback versions of them. These include pcre, zlib, xdr, and a few others. Still other dependencies will be present on any POSIX-compliant system.

Configuration and build
After downloading the source here , you have a few decisions to make. The first is where you want to install R. You don’t have to install R anywhere per se because it can be run straight from the build directory, you can just place the R script (which contains the prefix hardcoded) in the bin subdirectory anywhere on your PATH. If you do not specify the prefix, it will default to the build directory.

It’s customary to set your prefix for user compiled software to /usr/local, so that’s what we’ll do here.

The other decisions that have to be made are very platform/system specific. You can see all the configuration options by running

./configure --help

The auto-configuration is very good at setting sane defaults for most of these options. For example, if you’re building on OS X, it will by default build R as a framework and shared library, which you would need if you want to use R.app. This is a separate install.

On OS X, I ran my pre-configuration and configuration thusly:

export CC="clang"
export CXX="clang"
export F77="gfortran-4.2 -arch x86_64"
export FC=$F77
export OBJC="clang"
./configure -prefix=/usr/local

Assuming everything goes well, you can now start building with

make

If it successfully builds, you can install R to the prefix with

make install

Now you have R.

If your on a Mac, you may have noticed that you have a crippled R install. This is for a few reasons.

  • The binary from CRAN comes with R.app. If you want that, you have to build that yourself.
  • You can no longer download binary builds of your favorite R packages. It has to build them from source now.



As an R user on a Mac, you then realize how good you’ve had it. The binary build from CRAN comes with R.app, a fast R framework, and it installs binary R packages by default. Now you no longer have those options.

Additionally, dear Mac-user, you also have the benefit of using RStudio’s new Cocoa interface. Count your lucky stars, install CRAN’s binary build, and read my next post about how to switch out the linear algebra libraries that R uses for a few other faster alternatives.


To leave a comment for the author, please follow the link and comment on his blog: On The Lambda.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.