Benchmarking R/RRO in OSX and Ubuntu on the cloud

April 10, 2015

(This article was first published on Numbr Crunch » R, and kindly contributed to R-bloggers)

** This is a modified version of a previous R benchmark that was done back in 2011. Click this link to see the original post.

After using R for quite some time, you get to know a little bit about its strengths and weaknesses. It structures data very well and has a huge library of statistical and data processing packages, which makes analysis a breeze. What it lacks is the ability to deal with really large data, and processing SPEED. We’re going to focus on the speed issue, especially since there are some easy ways to improve this.

I’m sure most people have heard of Revolution Analytics. They offer a free, enhanced version of R, called Revolution R Open (RRO), which allows multi-core processing (standard R is single-core) and is very easy to setup. There’s definitely some debate about whether or not RRO really does improve upon R. As you’ll see from the data below, in some cases it’s not very clear that it does and in some cases it is. We’re also going to look at the difference between running R/RRO locally on Mac OSX and on the cloud through Ubuntu.

My notebook setup:
  • Mac OS X Yosemite 10.10.2
  • 7 GHz Intel Core i5 (dual-core)
  • 4 GB ram
Cloud server setup:
  • Ubuntu 14.04
  • Dual-core CPU
  • 4 GB ram

For both the notebook and the cloud setup, I ran benchmarks for both R and RRO, so 4 different variations in total. The benchmark code that I used is a modification of the benchmark code provided in the link at the top. I added a section for matrix operations since that is one of the categories in which RRO really shines according to their website. See the code below.

# clear workspace

# print system information

# install non-core packages
install.packages(c('party', 'rbenchmark', 'earth'))

# load packages

# function from
k <- function(n, x=1) for (i in 1:n) x=1/{1+x}

# create random matrix
mat1 <- matrix(data = rexp(200, rate = 10), nrow = 3000, ncol = 3000)
mat2 <- matrix(data = rexp(200, rate = 10), nrow = 3000, ncol = 3000)

# prepare data set from UCI Repository
# see:
mydata=read.csv(url, header=F)

# run benchmark
results <- benchmark(ct=ctree(V16 ~ .,data=mydata),
 e=earth(V16 ~ ., data=mydata),
 rp=rpart(V16 ~ ., data=mydata),
 k(1e6, x=1),

Benchmarks – Table
  ctree (s) earth (s) mm (s) k (s) rpart (s)
R_OSX_3.1.3 284 155 614 8 0.51
RRO_OSX_3.1.2 297 147 39 10 0.47
R_Ubuntu_3.0.2 182 127 810 15 0.45
RRO_Ubuntu_3.1.2 130 119 28 8 0.42

Benchmarks – Graph



For the most part, RRO performs significantly faster than standard R both locally and on the server. RRO performs really well on the matrix operations as seen in column group mm (over 90% faster than standard R); this is probably due to the addition of the Intel Math Kernal library. Standard R actually did better than RRO on the local machine for the ctree and k functions, which is definitely unexpected after all of the lofty claims made Revolution Analytics. The increase isn’t huge so maybe we can attribute this to the randomness of the small sample. Both standard R and RRO perform much better on the Ubuntu server. This is most likely because the operating system on the server doesn’t have all the extra bloat-ware that a pc operating system has. RRO performs better than standard R in all the tests I ran on the server, making it the clear winner on the server side.

Overall, it looks like cloud computing with a little help from RRO is definitely the way to go. Unfortunately this setup is definitely not the easiest for the average person to achieve. Good thing I’m working on a little side-project to help solve this issue:), …more to come about that in a future post.

To leave a comment for the author, please follow the link and comment on their blog: Numbr Crunch » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)