Software for Research, Part 3: [R], RStudio and ggplot2 for Statistics

[This article was first published on Francesco Giorlando, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

[R] is an excellent open-source statistics language. It’s cross-platform and free and I think it will eventually displace proprietary stat’s packages due to its rapid development, speed and ease of use. So there’s no time like the present to get used to using it.

This figure from the site r4stats.com would appear to support that view (number of posts in main discussion groups per month)

InternetDiscussion_pop.png

Like all new programs, it has a little bit of a learning curve, especially if you’re not used to using the command line. But don’t let that turn you off, for any sort of statistics beyond the most basic, you’re going to end up working with scripts anyway, it’s just the most efficient way to run analyses. Graphical menus while useful to begin with quickly become a hindrance.

There are a few tricks for making your first analysis with [R] a bit simpler, this post will cover some of these tips (and be expanded over time to cover more).
But first, here’s the reason why most people use [R], very pretty plots, note too that it’s a very weird scale on this plot (a reciprobit plot), something that many programs just can’t do:

Contents

Installing [R]
Using [R] and ggplot
More about RStudio
Getting help – Stackoverflow
Links
Useful commands


Installing [R]
The installation procedure varies a bit between different operating systems, it’s clearly described from the main [R] site, pick a local mirror and download for Linux/Mac/Windows from there.
Also really worth downloading RStudio, a nice user interface for [R] from their site.

Using [R] and ggplot
There are a number of ways to work with [R], if you’re familiar with other statistics programs you may have used scripting options before. [R] is much more powerful than just running sequential scripts, it is a programming environment in its own right.
While you can work in the R console directly, most people tend to use a type of ‘scrapbook workflow’ in which a text editor is used to write a ‘source file’. This is then run by [R], either by copying into the [R] console or by using a plugin that links your text editor to [R}

R works nicely with a text editor that does code colouring, my favourite on OS X is TextMate, it has a bundle specifically to interact with [R], here.

Here is a full list of text editors that play nice with R on all platforms.

If you haven’t worked in this way before, it can be a little daunting to start off with as you need to track files as well as graphical windows as well as using commands. Eventually you end up with a workflow that suits you best, but there is a program called RStudio which does this all for you, so it’s a great place to start.

I’ve been using RStudio for since the start of October (2011) and have been very impressed with it’s clean workflow and way in which the source file and console are integrated. It also has a web interface which runs on a linux server, try the one running on my test server if you’d like to try [R] without installing it yourself:

http://chronop.ath.cx:8787
username: rstudio_test
password: testing
r-studio.png

More about RStudio
It’s rare that a software project makes a large impact early on, but the concept, programming and support which accompany RStudio are superlative. I’ve had a few questions along the way and my thanks go to Josh Paulson, one of the RStudio developers, who has been extremely helpful. The online support docs are also in good order.

A software project also needs recognition and I was pleasantly surprised to hear a colleague say the other day over coffee, “Have you tried RStudio?”. Clearly the developers are filling a niche with a lot of demand!

So to get back to the advantages of running RStudio server side:

The best thing about the server version is that you can pick up your session from where you left off no matter which computer you are using. In fact, it’s such a comfortable way of working with [R] that I would recommend it for routine use. It also opens up the possibility of sharing analyses easily (although user management needs to be done manually at the moment).

RStudio.org will eventually offer a hosted solution but if you’d like an account on my server, drop me a message in the contact form.

Getting help – Stackoverflow
There are a number of [R] mailing lists but I have found one of the most useful resources to be Stackoverflow, responses are usually very quick and the format of the site makes it a great reference too.

Links

  • Quick-R – a great introductory site with examples
  • R cookbook – a wiki of R techniques
  • R graph gallery – thumbnails and methods for hundreds of plots
  • R bloggers – aggregated blog posts about R
  • a video by Hadley Wickham, the creator of ggplot, showing “The Future of Interactive Graphics in R”

To leave a comment for the author, please follow the link and comment on their blog: Francesco Giorlando.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)