Faster R in Hadoop: rmr 1.3 now available

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R. New features include:
  • An optional vectorized API for efficient R programming when dealing with small records.
  • Fast C implementations for serialization and deserialization from and to typedbytes.
  • Other readers and writers work much better in vectorized mode, namely csv and text
  • Additional steps to support structured data better (use more data frames and fewer lists in the API)
  • More forgiving behavior for package loading and bug fixes

Also, the documentation has gotten a major overhaul in this version, with pages of combined text, code and graphics generated automatically using the knitr package. (RHadoop lead developer Antonio Piccolboni provides some background on how knitr is used in these documentation guidelines.)

If you haven't take a look at rmr before, this tutorial by Jeffrey Breen is a great place to get started. Otherwise, check out the wiki pages on the RHadoop github site, linked below.

github: RevolutionAnalytics / RHadoop 

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)