How to use SparkR within Rstudio?

June 14, 2015

(This article was first published on Shige's Research Blog, and kindly contributed to R-bloggers)

Setting up Spark and SparkR is quite easy (assume you are running v.1.4): just grab one of the pre-built binaries and unzip to a folder. There is also a shell script to start SparkR from command line. The document suggest to put the following lines

.libPaths(c(file.path(Sys.getenv(“SPARK_HOME”), “R”, “lib”), .libPaths()))
sc <- sparkR.init(master=”local”)

into the .Rprofile file. This, however, has the undesirable side effect of adding yet another directory to which R packages can be installed.

My solution is:

1. Create a soft link of the SparkR directory in the the directory where other R packages are installed (ln -s /home/shige/bin/spark/R/lib/SparkR /home/shige/R/x86_64-pc-linux-gnu-library/3.2)
2. Add only one line (Sys.setenv(SPARK_HOME=”/home/shige/bin/spark”)) to the .Rprofile file.

All set.

To leave a comment for the author, please follow the link and comment on their blog: Shige's Research Blog. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)