RHadoop updated: improved performance and more control

February 27, 2012

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Revolution Analytics' open-source RHadoop project, which provides integration between R and Hadoop, has been updated with the release of version 1.2 of the "rmr" package. New in this version: support for binary I/O formats, which improves on the text-only interfact by allowing use of faster and more space-efficient data formats like R's native serialization format. This version also improves the performance of the reduce step (to get around the fact that list appends in R are not constant-time operations), and provides control to the Hadoop user to do things like set number of reducers on a per-job basis.

Find more details about these and other updates in rmr 1.2 (available now) at the link below.

RHadoop: Overview of rmr v1.2

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)