Making it easy to use RHadoop on HDInsight Hadoop clusters

September 25, 2015
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The RHadoop packages make it easy to connect R to Hadoop data (rhdfs), and write map-reduce operations in the R language (rmr2) to process that data using the power of the nodes in a Hadoop cluster. But getting the Hadoop cluster configured, with R and all the necessary packages installed on each node, hasn't always been so easy.

But now with HDInsight, Microsoft's Apache Hadoop-in-the-cloud service, it's much easier. As you configure your Hadoop cluster, you now have the option of installing R and RHadoop as part of the setup process. It's simply of matter of setting an option to run a pre-prepared script on the cluster nodes, and complete instructions are provided for Linux-based and Windows-based Hadoop clusters. 

With the cluster thus configured, you can then use simple R commands to create data in HDFS, and use the mapreduce function from rmr2 to peform calculations on data using any R function, as shown in the toy example below:

RH-HDInsights

The script also installs a collection of R packages that will be useful for your mapreduce calls: rJava, Rcpp, RJSONIO, bitops, digest, functional, reshape2, stringr, plyr, caTools, and stringdist. And of course you can modify the setup script to install any other packages or tools you need on the nodes.

HDInsights is available with your Microsoft Azure subscription, or you can try HDInsights for free with a free one-month trial of Azure. If you're new to HDInsight, you might also want to check out these tutorials on getting started with Linux and Windows Hadoop clusters.

Microsoft HDInsight: Install and use R on Linux and Windows HDInsight Hadoop clusters  

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)