New Release of partools Package

[This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My new release of partools is now on CRAN.

The package is aimed at doing parallel data science in what I call an “un-MapReduce” manner. It takes the point of view that MapReduce-based frameworks such as Hadoop and Spark are fine for the types of applications their designers had in mind, namely rather simple SQL actions, but have fundamental handicaps that prevent them from performing well on many, if not most, of the types of computation that typical users need for large data sets and/or highly compute-bound applications. The distributed file/object nature of those MapReduce systems is retained, but the confining MapReduce computational paradigm is avoided.

The package now contains about 30 functions, ranging from infrastructure support to summary and aggregation to statistical/machine learning applications. See the vignette for a fairly detailed introduction. Two new capabilities that I wish to highlight are:

  • Aggregation and related operations on objects of class “data.table”.
  • Parallel computation for some modern statistical/machine learning algorithms (they are statistics to me, but you may call them machine learning if you prefer).

The core of that second highlighted set of functions makes use of what I call Software Alchemy, which I have explained in previous blog posts. See for instance the example on random forests in the vignette.

Happy Paralleling.:-)


To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)