More Snowdoop Coming

December 16, 2014

(This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers)

In spite of the banter between Yihui and me, I’m glad to hear that he may be interested in Snowdoop, as are some others.  I’m quite busy this week (finishing writing my Parallel Computation for Data Science book, and still have a lot of Fall Quarter grading to do 🙂 ), but you’ll definitely be hearing more from me on Snowdoop and partools, including the following:

  • Vignettes for Snowdoop and the debugging tool.
  • Code snippets for splitting and coalescing files, including dealing with header records.
  • Code snippet implementing a distributed version of subset().

And yes, I’ll likely break down and put it on Github. 🙂  [I’m not old-fashioned, just nuisance-averse. 🙂 ] Watch this space for news, next installment maybe 3-4 days from now.

