Lee Edlefsen on Big Data in R

Posted on December 3, 2014 by Christopher Bare in R bloggers | 0 Comments

[This article was first published on Digithead's Lab Notebook, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Lee Edlefsen, Chief Scientist at Revolution Analytics, spoke about Big Data in R at the FHCRC a week or two back. He introduced the PEMA or parallel external memory algorithm.

“Parallel external memory algorithms (PEMA’s) allow solution of both capacity and speed problems, and can deal with distributed and streaming data.”

When a problem is too big to fit in memory, external memory algorithms come into play. The data to be processed is chunked and loaded into memory a chunk at a time and partial results from each chunk combined into a final result:

initialize
process chunk
update results
process results

Edlefsen made a couple of nice observations about these steps. Processing an individual chunk can often be done independently of other chunks. In this case, it’s possible to parallelize. If updating results can be done as new data arrives, you get streaming.

Revolution has developed a framework for writing parallel external memory algorithms in R, RevoPemaR, making use of R reference classes.

I couldn’t find Edlefsen’s exact slides, but these decks on parallel external memory algorithms and another from UseR 2011 on Scalable data analysis in R seem to cover everything he talked about.

To leave a comment for the author, please follow the link and comment on their blog: Digithead's Lab Notebook.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Lee Edlefsen on Big Data in R

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)