Interactive R Notebooks on powerful cloud hardware

January 13, 2015
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by Nick Elprin
Co-Founder Domino Data Lab

"R Notebooks" use the IPython Notebook UI to run R (rather than Python) in notebook cells, giving you an interactive R environment hosted on scalable servers, accessible through a web browser. This post describes how and why we built our "R Notebooks" feature.

Our product, Domino, is a platform that facilitates the end-to-end analytical lifecycle, from early-stage exploration, through experimentation and refinement, all the way to deploying or "operationalizing" a model. Among other things, Domino makes it easy to move long-running or computationally intensive R tasks onto powerful hardware. In our cloud-hosted environment, you can choose any type of Amazon EC2 machine you want to use; or if you deploy Domino on-premise in your enterprise, you can configure your own hardware tiers.

Hardware

Domino was working great for users who wanted to run R scripts, but we had many users who also wanted to work interactively in R on a powerful server, without dealing with any infrastructure setup. I'll explain how we built our solution to this problem, but first, I'll describe the solution itself.

How R Notebooks work

We wanted a solution that: (1) let our users work with R interactively; (2) on powerful machines; and (3) without requiring any setup or infrastructure management. For reasons I describe below, we adapted IPython Notebook to fill this need. The result is what we call an R Notebook: an interative, IPython Notebook environment that works with R code. It even handles plotting and visual output!

So how does it work?

Step 1: Start a notebook session with one click:

Like any other run in Domino, this will spin up a new machine (on hardware of your choosing), and automatically load it with your project files.

Start-notebook-1

Step 2: Use the notebook!

R-notebooks

Any R command will work, including ones that load packages, and the system function. Since Domino lets you spin up these notebooks on ridiculously powerful machines (e.g., 32 cores, 240GB of memory), let's show off a bit:

R-notebooks-2

Easy sharing and collaboration

By interleaving code, comments, and graphics, the Notebook UI provides a great way to create and preserve a narrative about the analysis you're doing. The friendly UI also makes notebooks accessible to less technical users, letting you share your work with a broader audience.

Domino adds other nice features to your notebook sessions: each session is preserved as a snapshot, so you can get back to any past result and reproduce past work. And because Domino hosts all your notebooks (and data, and results) centrally, you can share your work with others just by sending a link

Motivation

Our vision for Domino is to be a platform that accelerates work across the entire analytical lifecycle, from early exploration, all the way to packaging and deployment of analytical models. We think we're well on our way toward that goal, and this post is about a recent feature we added to fill a gap in our support for early stages of that lifecycle: interactive work in R.

The analytical lifecycle

Analytical ideas move through different phases:

  1. Exploration / Ideation. In the early stages of an idea, it's critical to be able to "play with data" interactively. You are trying different techniques, fixing issues quickly, to figure out what might work.

  2. Refinement. Eventually you have an approach that you want to invest in, and you must refine or "harden" a model. Often this requires many more intensive experiments: for example, running a model over your entire data set with sevearl different parameters, to see what works best.

  3. Packaging and Deployment. Once you have something that works, typically it will be deployed for some ongoing use: either packaged into a UI for people to interact with, or deployed with some API (or web service) so software systems can consume it.

Domino offers solutions for all three phases, in multiple different languages, but we had a gap. For interactive exploratory work, we support IPython Notebooks for work in Python, but we didn't have a good solution for work in R.

Stage of the analytical lifecycle
  1. Explore / Ideate 2. Experiment / Refine 3. Deploy / Operationalize
Requirements Interactive environment Able to run many experiments in parallel, quickly, and track work and results Easily create a GUI or web service around your model
Our solution
for R
Gap to
address
Our bread and butter: easily run your scripts on remote machines, as many as you want, and keep them all tracked Launchers for UI, and RServe powering API publishing
Our solution
for Python
IPython Notebooks Launchers for UI, and pyro powering API publishing

 

Implementation details

Since we already had support for spinning up IPython Notebook servers inside docker containers on arbitrary EC2 machines, we opted to use IPython Notebook for our R solution.

A little-known fact about IPython Notebook (likely because of its name) is that it can actually run code in a variety of other languages. In particular, its RMagic functionality lets you run R commands inside IPython Notebook cells by prepending your commands with the %R modifier. We adapted this "hack" (thanks, fperez!) to prepend the RMagic modifying automatically to every cell expression.

The approach is to make a new ipython profile with a startup script that automatically prepends the %R magic prefix to any expression you evaluate. The result is an interactive R notebook.

The exact steps were:

  1. pip install rpy2
  2. ipython profile create rkernel
  3. Copy rkernel.py into ~/.ipython/profile_rkernel/startup

Where rkernely.py is a slightly-mofified version of fperez's script. We just had to change the rmagic extension on line 15 to the rpy2.ipython extension, to be compatible with IPython Notebook 2.

"""A "native" IPython R kernel in 15 lines of code.

This isn't a real native R kernel, just a quick and dirty hack to get the  
basics running in a few lines of code.

Put this into your startup directory for a profile named 'rkernel' or somesuch,  
and upon startup, the kernel will imitate an R one by simply prepending `%%R`  
to every cell.  
"""

from IPython.core.interactiveshell import InteractiveShell

print '*** Initializing R Kernel ***'  
ip = get_ipython()  
ip.run_line_magic('load_ext', 'rpy2.ipython')  
ip.run_line_magic('config', 'Application.verbose_crash=True')

old_run_cell = InteractiveShell.run_cell

def run_cell(self, raw_cell, **kw):  
    return old_run_cell(self, '%%Rn' + raw_cell, **kw)

InteractiveShell.run_cell = run_cell  

What about RStudio server?

Some folks who have used this have asked why we didn't just integrate RStudio Server, so you could spin up an RStudio session in the browser. The honest answer is that using IPython Notebook was much easier, since we already supported it. We are exploring an integration with RStudio Server, though. Please let us know if you would use it.

In the meantime, please try out our new R Notebook functionality and let us know what you think!

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)