Advent of 2021, Day 6 – Setting up IDE

Series of Apache Spark posts:

Let’s look into the IDE that can be used to run Spark.

Remember that Spark can be used with languages: Scala, Java, R, Python and each give you different IDE and different installations.

Jupyter Notebooks

Start Jupyter Notebooks and create a new notebook and you can connect to Local Spark installation.

For the testing purposes you can add code like:

spark = SparkSession.builder.set_master("spark://tomazs-MacBook-Air.local:7077")

And start working with the Spark code.


In Python, you can open a PyCharm or Spyder and start working with python code:

import findspark
from pyspark import SparkContext

sc = SparkContext(appName="SampleLambda")
x = sc.parallelize([1, 2, 3, 4])
res = x.filter(lambda x: (x % 2 == 0))


Open RStudio and install sparkly package, create a context and run a simple R script:

# install

# install local version
spark_install(version = "2.2.0")

# Create a local Spark master 
sc <- spark_connec(master = "local")

iris_tbl <- copy_to(sc, iris)


There you go. This part was fairly short but crucial for coding.

Tomorrow we will start exploring spark code. 🙂

Compete set of code, documents, notebooks, and all of the materials will be available at the Github repository:

Happy Spark Advent of 2021! 🙂

