Advent of 2021, Day 25 – Spark literature, documentation, courses and books

Posted on December 25, 2021 by tomaztsql in R bloggers | 0 Comments

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Series of Apache Spark posts:

Dec 01: What is Apache Spark
Dec 02: Installing Apache Spark
Dec 03: Getting around CLI and WEB UI in Apache Spark
Dec 04: Spark Architecture – Local and cluster mode
Dec 05: Setting up Spark Cluster
Dec 06: Setting up IDE
Dec 07: Starting Spark with R and Python
Dec 08: Creating RDD files
Dec 09: RDD Operations
Dec 10: Working with data frames
Dec 11: Working with packages and spark DataFrames
Dec 12: Spark SQL
Dec 13: Spark SQL Bucketing and partitioning
Dec 14: Spark SQL query hints and executions
Dec 15: Introduction to Spark Streaming
Dec 16: Dataframe operations for Spark streaming
Dec 17: Watermarking and joins for Spark streaming
Dec 18: Time windows for Spark streaming
Dec 19: Data engineering for Spark streaming
Dec 20: Spark GraphX processing
Dec 21: Spak GraphX operators
Dec 22: Spark in Azure Databricks
Dec 23: Delta Live Tables with Azure Databricks
Dec 24: Data Visualisation with Spark

To wrap up this year’s Advent of Spark 2021 – series of blogposts on Spark – it is essential to look at the list of additional learning resources for you to continue with this journey. Let’s divide this list not by type of the resource (book, on-line documentation, on-line courses, articles, Youtube channels, Discord channels, and others) but rather divide them by language flavour. Scala/Spark, R, and Python.

Spark – Scala

Spark Official Documentation – link
Spark: The definitive Guide – link
Stream processing with Apache Spark – link
Data Engineering with Apache Spark, Delta Lake, and Lakehouse – link
Programming Scala – 3rd edition – link
Scala & Spark – Master Big Data with Scala and Spark – link
Getting started with Apache Spark on Databricks – link to course
Apache Spark – link

R Language

Mastering Spark with R – link
SparkR documentation – link
Sparklyr: R interface for Apache Spark – link
R and Spark: How to Analyze Data Using RStudio’s Sparklyr and H2O’s Rsparkling Packages – link
Sparklyr in SQL Server Big Data cluster – link
Big data in R – Intro to Sparklyr – link

Python

Spark with PySpark – link
Spark and Python for Big Data with PySpark – link to course
PySpark intro – link
Apache Spark 3 for Data Engineering and Analytics with Python – link

Wrapping up this year’s series of Advent of Spark! Merry Christmas and Happy new Year 2022!

Compete set of code, documents, notebooks, and all of the materials will be available at the Github repository: https://github.com/tomaztk/Spark-for-data-engineers

Happy Spark Advent of 2021!

To leave a comment for the author, please follow the link and comment on their blog: R – TomazTsql.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers