“Parallel external memory algorithms (PEMA’s) allow solution of both capacity and speed problems, and can deal with distributed and streaming data.”
When a problem is too big to fit in memory, external memory algorithms come into play. The data to be processed is chunked and loaded into memory a chunk at a time and partial results from each chunk combined into a final result:
- process chunk
- update results
- process results
Edlefsen made a couple of nice observations about these steps. Processing an individual chunk can often be done independently of other chunks. In this case, it’s possible to parallelize. If updating results can be done as new data arrives, you get streaming.