Base R can be Fast

January 15, 2018
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

“Base R” (call it “Pure R”, “Good Old R”, just don’t call it “Old R” or late for dinner) can be fast for in-memory tasks. This is despite the commonly repeated claim that: “packages written in C/C++ are faster than R code.”

The benchmark results of “rquery: Fast Data Manipulation in R” really called out for follow-up timing experiments. This note is one such set of experiments, this time concentrating on in-memory (non-database) solutions.

Below is a graph summarizing our new results for a number of in-memory implementations, a range of data sizes, and two different machine types.

Unnamed chunk 2 1

The graph summarizes the performance of four solutions to the “scoring logistic regression by hand” problem:

  • Optimized Base R: a specialized “pre allocate and work with vectorized indices” method. This is fast as it is able to express our particular task in a small number of purely base R vectorized operations. We are hoping to build some teaching materials about this methodology.
  • Idiomatic Base R (shown dashed): an idiomatic R method using stats::aggregate() to solve the problem. This method is re-plotted in both graphs as a dashed line and works as a good division between what is fast versus what is slow.
  • data.table: a straightforward data.table solution (another possible demarcation between fast and slow).
  • dplyr (no grouped filter): a dplyr solution (tuned to work around some known issues).

This benchmarking series reveals a number of surprises. It says: trust conventional wisdom a bit less, and to budget more time for benchmarking pilot experiments in future R projects. Contrary to claims otherwise: base R code can be good code, with some care it can sometimes perform better than package alternatives. There is no need to apologize for writing R code when using R.

Benchmarking details can be found here and here, and plotting details here.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)