(This article was first published on

**Mad (Data) Scientist**, and kindly contributed to R-bloggers)I’ve put together an updated version of my **partools **package, including Snowdoop, an alternative to MapReduce algorithms. You can download it here, version 1.0.1.

To review: The idea of Snowdoop is to create your own file chunking, rather than having something like Hadoop do it for you, and then using ordinary R coding to perform parallel operations. This avoids the need to deal with new constructs and complicated configuration issues with Hadoop and R interfaces to it.

Major changes are as follows:

- There is a k-means clustering example of Snowdoop in the
**examples/**directory. Among other things, it illustrates the fact that with the Snowdoop approach, one automatically achieves a “caching” effect lacking in Hadoop, trivially by default. - There is a
**filesort()**function, to sort a distributed file, keeping the result in memory in distributed form. I don’t know yet how efficient it will be relative to Hadoop. - There are various new short utility functions, such as
**filesplit().**

Still not on Github yet, but Yihui should be happy that I converted the Snowdoop vignette to use **knitr.** 🙂

All of this is still preliminary, of course. It remains to be seen to what scale this approach will work well.

To

**leave a comment**for the author, please follow the link and comment on their blog:**Mad (Data) Scientist**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...