Understanding Overhead Issues in Parallel Computation

[This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my talk at useR! earlier this month, I emphasized the fact that a major impediment to obtaining good speed from parallelizing an algorithm is systems overhead of various kinds, including:

  • Contention for memory/network.
  • Bandwidth limits — CPU/memory, CPU/network, CPU/GPU.
  • Cache coherency problems.
  • Contention for I/O ports.
  • OS and/or R limits on number of sockets (network connections).
  • Serialization.

During the Q&A at the end, one person in the audience asked how R programmers without a computer science background might acquire this information. A similar question was posed here today by a reader on this blog, to which I replied,

That question was asked in my talk. I answered by saying that I have an introduction to such things in my book, but that this is not enough. One builds this knowledge in haphazard ways, e.g. by search terms like “cache miss” and “network latency” on the Web, and above all, by giving it careful thought and reasoning things out. (When Nobel laureate Richard Feynman was a kid, someone said in awe, “He fixes radios by thinking!”)

Join an R Users Group, if there is one in your area. (And if not, then start one!) Talk about these things with them (though if you follow my above advice, you may find you soon know more than they do).

The book I was referring to was Parallel Computing for Data Science: With Examples in R, C++ and CUDA (Chapman & Hall/CRC, The R Series, Jun 4, 2015.

I have decided that the topic of system overhead issues in parallel computation is important enough for me to place Chapter 2 on the Web, which I have now done. Enjoy. I’d be happy to answer your questions (of a general nature, not on your specific code).

We are continuing to add more features to our R parallel computation package, partools. Watch this space for news!

By the way, the useR! 2017 videos are now on the Web, including my talk on parallel computing.


To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)