My First Few Days with RStudio

Posted on March 9, 2011 by Ryan Rosario in R bloggers | 0 Comments

[This article was first published on Byte Mining » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As most readers are probably aware, the free IDE for R, called RStudio, was recently released for general use and it immediately made huge waves within the R community. IDE stands for Integrated Development Environment. IDEs typically provides a rich set tools developing in some target language. For standard programming languages like C++ (VisualStudio) and Java (Eclipse or NetBeans), IDEs contain:

an editor tailored to the target language. The editor typically has tab/auto-complete for variable names, functions and class methods and properties and also features syntax highlighting.
a multiple document interface (MDI) where there may be several documents opened in different tabs.
a window that interacts with the compiler, or a panel containing the console to the language, a la MATLAB, and even vanilla R’s GUI.
a debugger
a file browser and language reference.

RStudio plays to this analogy very well, and makes modifications where appropriate. RStudio provides many features that are lacking in the standard R GUI, and improves on features that do not work properly in the Windows R GUI. Over the past few days, I have been doing all of my R analysis within RStudio, shortly with the Desktop version, and mostly with the Server version. I will discuss mostly the server version since that is what I have been using. It is identical (AFAIK) to the desktop version, so you are not missing anything by using either version.

RStudio Server

The biggest win for me with RStudio is the Server edition. I can access my work on any system that can communicate with the server. The interface always looks the same, and all I need is a web browser to access it. Before RStudio Server Edition, I had to run two versions of R: R GUI on my local machine for graphics and presentation, and a headless R on a research server for processing, where the server contained my data and the rest of my workflow. I no longer need to run multiple versions of R in my workflow.

First, installation is miraculously easy. I only had a few very minor glitches to deal with. Armed with sudo access to a machine on a research cluster at work, I was able to simply download the RPM and install it using the instructions provided on the web site. Then, all I had to do was fire up a browser and go to

http://servername.com:8787

and I was asked for my login credentials. But I couldn’t get in. This server authenticates using LDAP, but all I had to do was replace the contents of /etc/pam.d/rstudio with the contents or /etc/pam.d/login and I was able to login. But then there was a “unknown error.” Oh, the version of R that was installed was too old (2.8). I just did a yum upgrade R, and RStudio logged me in with no problems. What showed up on my browser screen was beautiful! It looked identical to the desktop version of RStudio.

Once logged in, I somehow have access to ALL of my files on the remote server. I can load my data (typically produced by Hadoop) already residing on the server, and I can save output, graphs, data and even the R session itself on the remote server! All while just clicking buttons. No commands to remember, no screwed up PDF files, and most importantly…. no scping files back and forth from the server just to create a plot (X worked well, but had limitations)!

Things I Love about R Studio

I will have to go panel by panel, but even then I will have missed cool features. I also will not discuss features that are already present in the MacOS X R GUI and are repeated and beautified in RStudio:
The R command prompt still looks the same. At first, my reaction was “Damn, what am I supposed to do?” But when the GUI finished loading, the familiar R command prompt appeared in all is 1970ish glory. I immediately started typing commands and seeing fields in the other panes populate and change to display different usages. It left me with a “oh, I see” feeling.

Saves R sessions correctly, and when I return to RStudio, ALL of my work is there! I could never get the save session/image function to work in R GUI. I gave up several years ago. In RStudio, it works properly, but you don’t even need it because… when you leave RStudio and then return, everything is there! The workspace (variables, functions, data, etc), the scripts you were working on, the plots, even the last dang help screen you looked at!

The Stop Execution button in the console actually works. When executing a long running computation in R GUI (that’s the first mistake), it is sometimes necessary to cancel the computation either because I made an error, or because the computation is killing my system’s performance. In R GUI, particularly on MacOS X, the Stop Execution button did absolutely nothing, because there was typically a spinning beachball preventing me from clicking it. Hitting ESC also did not work. In RStudio, clicking Stop actually seems to break out of the madness.

Workspace panel. The workspace panel displays the variables, functions, data frames and other objects that reside in the current workspace, a la MATLAB. From this panel, one can also switch or save workspaces. The user can also import a dataset from a text file using a trivial wizard (a la SPSS, etc.), or from a web URL. The user can also clear the workspace. A frequently overlooked command to do the same from the command line is rm(list=ls()), but that is no longer necessary to remember!

Clicking on a data frame object in the workspace pane, causes it to be displayed in a nice tabular format. It can also be printed to a local printer, or opened in a new window.

Clicking on a numerical value allows the user to change it by opening an in-place edit box. Clicking on other objects like lists, vectors and functions opens an edit window displaying the definition that created it.

Files panel. There is nothing really exciting to see here, except that by clicking the Upload button, I can upload files directly to the remote server just by selecting the file, without having to SCP!

Scripting panel. This is the second best feature of R studio and has the same feeling as the stock script editor that ships with R. The largest difference is that the editor in RStudio is stable. On MacOS X, the editor tends to garble 2-3 rows of code together on every single scroll. This editor does a better job of indentation than R GUI. When opening a function, R GUI tends to indent the body properly, but insert a closing } prematurely. RStudio’s editor also features auto-completion, a feature present in the command-line of R GUI and R, but not in the editor of R GUI. The user can also save their script on the remote server, print code to a local printer and search. Similar to MATLAB, the user can select one or more lines of code and run them by clicking the “Run Line(s)” button, rather than having to copy and paste lines. “Run All” is a point-and-click replacement for source.

The “Source on Save” function is interesting. If enabled, RStudio will run/source the script each time the script is saved. Honestly, I do not find this feature to be all that useful unless in the middle of debugging, and dangerous if not debugging. Suppose after a long 10-fold-cross-validation computation there is an error that we want to fix. We fix the error and save the script. Do we really want to run the computation again? If R were a compiled language, then yes. Since R is not a compiled language, this feature is not entirely useful in concept.

The “magic wand” icon contains what I suspect to be a growing collection of coding tools. Currently, the user can comment and uncomment a bunch of lines at once. This is particularly useful since, for some reason, there is no multiline comment flag in R. The user can also select a series of lines and wrap a function around them. This feature could be dangerous for those not familiar with coding but provides a very nice way to put a bunch of code into a function as an afterthought.

Plot panel. By far my favorite part of RStudio is the plot panel! All plots are saved in this panel, and the user move back and forth among plots that were already constructed. The Export button allows exporting a plot to user defined dimensions and save to the local machine as a PNG, or even copy it to the local machine’s clipboard! Of course, the PDF button produces a PDF file of the plot that can be saved on the local machine. If the plots are all too much, we can click “Clear All” and start again with a clean slate.

But, is it possible to create plots of larger size? I am sure there is, but I did not spend much time looking.

LaTeX and Sweave documents. From the File menu the user can create new documents including LaTeX and Sweave. Unfortunately, I cannot experiment more with these features because there is something amiss in my configuration. For students and researchers, having Sweave and LaTeX integrated with RStudio is a huge, huge, huge advantage. No longer must we copy/paste among different programs. To make the integration complete, BibTeX, Asymptote/TikZ/gnuplot whatever should be easily included by the user.

At any point if the user interface shows stale data, there is a Reload button to help you out by refreshing the entire RStudio interface.

Things that Need Improvement

I do not really have any complaints about RStudio, quite the opposite actually. However, there are some things that do not seem to work. I should note however, that I have not spent much (well, any) time debugging them. The developers are probably already working on some of them. Some of them are probably problems in my configuration and others are probably settings that I need to tweak.

No auto-completion of parentheses or quotation marks. This is a bummer, but not a deal breaker. On the other hand, as you type closing marks, RStudio highlights the matching mark.

The dataset view needs work. Columns can’t be resized. Other natural functionalities that seem to be missing are: column renaming (a call to names), cannot sort or order values by a column, and data manipulation (I didn’t say that). These missing features are a tad disappointing, but a hell of a lot better than displaying in the terminal.

Install packages in the packages panel does not work on our server’s configuration.

LaTeX cannot be found. Upon attempting to create a new LaTeX or Sweave document, I got a friendly notice (instead of a bizarre error message) saying that LaTeX is not installed. The problem is, it is installed and there does not seem to be anywhere in the GUI to configure its location. Additionally, some LaTeX templates would be useful.

In Conclusion…

My Workflow Before and After RStudio

Before RStudio

After RStudio

All in all, the biggest win for me with RStudio is the Server edition. I can access my work on any system that can communicate with the server. The interface always looks the same, and all I need is a web browser to access it. I no longer need to run multiple versions of R in my workflow.

The developers of this open source project seemed to get it right on the first try. How the hell is that possible??? So has anyone switched from the big R to the big blue ball?

To leave a comment for the author, please follow the link and comment on their blog: Byte Mining » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

My First Few Days with RStudio

RStudio Server

In Conclusion…

Related

RStudio Server

In Conclusion…

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)