Snowdoop/partools Update

December 27, 2014

(This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers)

I’ve put together an updated version of my partools package, including Snowdoop, an alternative to MapReduce algorithms.  You can download it here, version 1.0.1.

To review:  The idea of Snowdoop is to create your own file chunking, rather than having something like Hadoop do it for you, and then using ordinary R coding to perform parallel operations.  This avoids the need to deal with new constructs and complicated configuration issues with Hadoop and R interfaces to it.

Major changes are as follows:

  • There is a k-means clustering example of Snowdoop in the examples/ directory.  Among other things, it illustrates the fact that with the Snowdoop approach, one automatically achieves a “caching” effect lacking in Hadoop, trivially by default.
  • There is a filesort() function, to sort a distributed file, keeping the result in memory in distributed form.  I don’t know yet how efficient it will be relative to Hadoop.
  • There are various new short utility functions, such as filesplit().

Still not on Github yet, but Yihui should be happy that I converted the Snowdoop vignette to use knitr. 🙂

All of this is still preliminary, of course.  It remains to be seen to what scale this approach will work well.

To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)