Timing the Same Algorithm in R, Python, and C++

Posted on January 6, 2019 by John Mount in R bloggers | 0 Comments

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python.

This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read on for a summary of the results.

The algorithm in question is the general dynamic programming solution to the “minimum cost partition into intervals” problem. As coded in C++ it uses one-time allocation of large tables and then for-loops and index chasing to fill in the dynamic programming table solution. The C++ code is given here.

I then directly transliterated (or line-for line translated) this code into R (code here) and Python (code here). Both of these implementations are very direct translations of the C++ solution, so they are possibly not what somebody starting in R or Python would design. So really we are coding in an an imperative C style in R and Python. To emphasize the shallowness of the port I deliberately left the semi-colons from the C++ in the R port. The Python can be taken to be equally “un-Pythonic” (for example, we are using for loops and not list comprehensions).

That being said we now have very similar code to compare in all three languages. We can summarize the timings (details here and here) as follows.

problem	solution language	time in seconds
500 point partition into intervals dynamic program	R	21
500 point partition into intervals dynamic program	C++ (from R via Rcpp)	0.088
500 point partition into intervals dynamic program	Python	39

Notice for this example C++ is 240 times faster than R, and R is almost twice as fast as Python

Neither R nor Python is optimized for the type of index-chasing this dynamic programming solution depends on. So we also took a look at a simpler problem: computing the PRESS statistic, which is easy to vectorize (a preferred writing efficient code in R nor Python). When we compare all three languages on this problem we see the following.

problem	solution method	time in seconds
3,000,000 point PRESS statistic calculation	R scalar code	3.4
3,000,000 point PRESS statistic calculation	Rcpp scalar code	0.26
3,000,000 point PRESS statistic calculation	R vectorized code	0.35
3,000,000 point PRESS statistic calculation	Python vectorized (`numpy) 0.21 The timing details can be found here and here. Ignoring the R` scalar solution (which is too direct a translation from `C++` to `R`, but a stepping stone to the `R` vectorized solution as we discuss here). We see: vectorized `Python` is now about 1.6 times faster than the vectorized `R` and even 1.2 times faster than the `C++` (probably not due to `Rcpp`, but instead driven by my choice of container class in the `C++` code). Obviously different code (and per-language tuning and optimization) will give different results. But the above is consistent with our general experience with `R`, `Python`, and `C++` in production. In conclusion: `R` and `Python` are in fact much slower than `C++` for direct scalar manipulation (single values, indexes, and pointers). However, `R` and `Python` are effective glue languages that can be fast when they are orchestrating operations over higher level abstractions (vectors, databases, data frames, and Spark). Related Share Tweet To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. ← Previous post Next post → R bloggers Facebook page Most viewed posts (weekly) PCA vs Autoencoders for Dimensionality Reduction 5 Ways to Subset a Data Frame in R How to write the first for loop in R How to Calculate a Cumulative Average in R Date Formats in R Complete tutorial on using 'apply' functions in R R – Sorting a data frame by the contents of a column Sponsors Recent Posts Something to note when using the merge function in R Better Sentiment Analysis with sentiment.ai Self-documenting plots in ggplot2 Data Challenges for R Users simplevis: new & improved! Checking the inputs of your R functions Imputing missing values in R Creating a Dashboard Framework with AWS (Part 1) BensstatsTalks#3: 5 Tips for Landing a Data Professional Role Live COVID-19 Swiss vaccination analysis Complete tutorial on using ‘apply’ functions in R Getting to know Julia Bootstraps & Bandings How to Calculate a Cumulative Average in R Some thoughts about the use of cloud services and web APIs in social science research Jobs for R-users Junior Data Scientist / Quantitative economist Senior Quantitative Analyst R programmer Data Scientist – CGIAR Excellence in Agronomy (Ref No: DDG-R4D/DS/1/CG/EA/06/20) Data Analytics Auditor, Future of Audit Lead @ London or Newcastle python-bloggers.com (python/data-science news) Dunn Index for K-Means Clustering Evaluation Installing Python and Tensorflow with Jupyter Notebook Configurations How to Get Twitter Data using Python Visualizations with Altair Spelling Corrector Program in Python Spelling Checker Program in Python Streamlit Tutorial: How to Deploy Streamlit Apps on RStudio Connect Full list of contributing R-bloggers Archives Archives Other sites SAS blogs Jobs for R-users Copyright © 2022 \| MH Corporate basic by MH Themes Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.) Click here to close (This popup will not appear again)

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)