Welcome to the blog post! It’s been long time since I wrote last post. I was recently searching about big data with R and I found sparkR package. Few months back I heard about it and it was a separate project on github. Databricks is actively working on sparkR package. They officially announced its integration with Apache spark. In this post, I will discuss about how to configure sparkR with Rstudio in Ubuntu 12.04 and get started using it.
In order to use sparkR package, we need to simply follow few steps. Make sure you have already configured latest spark distribution in your system.
Here we go..
Step:1 Generate sparkR library from the source code comes with latest spark distribution (1.4.0)
Open terminal and navigate to “spark-1.4.0/R” and run command “./install-dev.sh” as shown below
This will generate lib folder under directory “saprk-1.4.0/R” as shown below
Step:2 Open R studio and load sparkR library as shown below
Step:3 Initialize sparkContext and create sparkR data frame shown below
That’s it! Complete R code is shown below.
#Load libraries library("rJava") library(SparkR, lib.loc="Path to library") #In my case #library(SparkR, lib.loc="/home/amar/Downloads/spark-1.4.0/R/lib") #Initalize spark context sc <- sparkR.init(sparkHome = "Path to sparkHome") #In my case #sc <- sparkR.init(sparkHome = "/home/amar/Downloads/spark-1.4.0") #Initalize sqlCOntext sqlContext <- sparkRSQL.init(sc) # Create SparkR dataframe from R dataframe SparkDf <- createDataFrame(sqlContext,faithful) head(SparkDf)
Enjoy using sparkR. Write you comments below if you face any difficulties.
Powered by Google+ Comments