Speeding up R code: A case study

February 22, 2010

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

On his Psychology and Statistics blog, Jeromy Anglim tells how he was analyzing some data from a skill acquisition experiment. Needing to run a custom R function across 1.3 million data points, Jeromy estimated it would take several hours for the computation to complete. So, Jeromy set out to optimise the code.

First, he used the Rprof function, which inspects your R functions as they run, and counts the amount of time spent in each sub-function. This is a useful tool to identify the parts of your functions that are ripe for optimisation, and in this case (with some help from the system.time function to time a specific section of the code) he learned that most of the time wasn’t taken performing actual calculations: most time was actually spent selecting the subset of the data to analyze!

And thus a solution was born: rather than repeatedly selecting from the large data frame in an iterative loop, he instead split the data frame into its constituent parts once, and then looped over the parts. This reduced the analysis time from hours down to just a couple of minutes. As the end of his case study, Jeromy shares some valuable lessons learned about optimising R functions:

  • R is very fast most of the time.
  • A single slow command can be the cause of a slow analysis.
  • system.time is a very useful function.
  • Optimisation can proceed from theory or from experimentation. 
  • Optimisation proceeds from diagnosing the cause of the problem to exploring solutions.
  • Optimisation is about orders of magnitude. Focus on saving hours before tackling saving seconds.

Jeromy Anglim’s Blog: Psychology & Statistics: A Case Study in Optimising Code in R

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)