Faster R in Hadoop: rmr 1.3 now available

July 23, 2012
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R. New features include:
  • An optional vectorized API for efficient R programming when dealing with small records.
  • Fast C implementations for serialization and deserialization from and to typedbytes.
  • Other readers and writers work much better in vectorized mode, namely csv and text
  • Additional steps to support structured data better (use more data frames and fewer lists in the API)
  • More forgiving behavior for package loading and bug fixes

Also, the documentation has gotten a major overhaul in this version, with pages of combined text, code and graphics generated automatically using the knitr package. (RHadoop lead developer Antonio Piccolboni provides some background on how knitr is used in these documentation guidelines.)

If you haven't take a look at rmr before, this tutorial by Jeffrey Breen is a great place to get started. Otherwise, check out the wiki pages on the RHadoop github site, linked below.

github: RevolutionAnalytics / RHadoop 

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.