SparkR with Rstudio in Ubuntu 12.04

[This article was first published on Pingax » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to the blog post! It’s been long time since I wrote last post. I was recently searching about big data with R and I found sparkR package. Few months back I heard about it and it was a separate project on github. Databricks is actively working on sparkR package. They officially announced its integration with Apache spark. In this post, I will discuss about how to configure sparkR with Rstudio in Ubuntu 12.04 and get started using it.

In order to use sparkR package, we need to simply follow few steps. Make sure you have already configured latest spark distribution in your system.

Here we go..

Step:1 Generate sparkR library from the source code comes with latest spark distribution (1.4.0)

Open terminal and navigate to “spark-1.4.0/R” and run command “./install-dev.sh” as shown below

Terminal

This will generate lib folder under directory “saprk-1.4.0/R” as shown below

libFolder

Step:2 Open R studio and load sparkR library as shown below

LoadLibrary

Step:3 Initialize sparkContext and create sparkR data frame shown below

intializeSparkContext

That’s it! Complete R code is shown below.

#Load libraries
library("rJava")
library(SparkR, lib.loc="Path to library")
#In my case
#library(SparkR, lib.loc="/home/amar/Downloads/spark-1.4.0/R/lib")

#Initalize  spark context
sc <- sparkR.init(sparkHome = "Path to sparkHome")
#In my case
#sc <- sparkR.init(sparkHome = "/home/amar/Downloads/spark-1.4.0")

#Initalize sqlCOntext
sqlContext <- sparkRSQL.init(sc)

# Create SparkR dataframe from R dataframe
SparkDf <- createDataFrame(sqlContext,faithful)
head(SparkDf)

Enjoy using sparkR. Write you comments below if you face any difficulties.

Powered by Google+ Comments

The post SparkR with Rstudio in Ubuntu 12.04 appeared first on Pingax.

To leave a comment for the author, please follow the link and comment on their blog: Pingax » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)