Series of Azure Databricks posts:
- Dec 01: What is Azure Databricks
- Dec 02: How to get started with Azure Databricks
- Dec 03: Getting to know the workspace and Azure Databricks platform
- Dec 04: Creating your first Azure Databricks cluster
- Dec 05: Understanding Azure Databricks cluster architecture, workers, drivers and jobs
- Dec 06: Importing and storing data to Azure Databricks
- Dec 07: Starting with Databricks notebooks and loading data to DBFS
- Dec 08: Using Databricks CLI and DBFS CLI for file upload
- Dec 09: Connect to Azure Blob storage using Notebooks in Azure Databricks
- Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks
- Dec 11: Using Azure Databricks Notebooks with R Language for data analytics
- Dec 12: Using Azure Databricks Notebooks with Python Language for data analytics
- Dec 13: Using Python Databricks Koalas with Azure Databricks
- Dec 14: From configuration to execution of Databricks jobs
- Dec 15: Databricks Spark UI, Event Logs, Driver logs and Metrics
- Dec 16: Databricks experiments, models and MLFlow
- Dec 17: End-to-End Machine learning project in Azure Databricks
- Dec 18: Using Azure Data Factory with Azure Databricks
- Dec 19: Using Azure Data Factory with Azure Databricks for merging CSV files
- Dec 20: Orchestrating multiple notebooks with Azure Databricks
- Dec 21: Using Scala with Spark Core API in Azure Databricks
- Dec 22: Using Spark SQL and DataFrames in Azure Databricks
- Dec 23: Using Spark Streaming in Azure Databricks
- Dec 24: Using Spark MLlib for Machine Learning in Azure Databricks
- Dec 25: Using Spark GraphFrames in Azure Databricks
- Dec 26: Connecting Azure Machine Learning Services Workspace and Azure Databricks
- Dec 27: Connecting Azure Databricks with on premise environment
- Dec 28: Infrastructure as Code and how to automate, script and deploy Azure Databricks with Powershell
- Dec 29: Performance tuning for Apache Spark
- Dec 30: Monitoring and troubleshooting of Apache Spark
In the last two days we have focused on understanding Apache Spark through performance tuning and through troubleshooting. Both require some deeper understanding of Spark and Azure Databricks, but gives also a great insight to all who will need to improve performance and work with Spark.
Today, I would like to list couple of additional Learning material, documentation and any other additional resources for further exploration on Azure Databricks.
Databricks / Azure Databricks
Good way to start with your learning path is the vendor documentation: https://docs.databricks.com/.
Microsoft has created another great documentation for Databricks Azure: https://docs.microsoft.com/en-gb/azure/databricks/
Databricks are vendor agnostic and one should also look AWS offerings and documentation: https://databricks.com/aws
Check the Github for great examples and documentation on Databricks and all related content:
Apache Spark offers extensive and great documentation on the Apache Spark website:
– Spark: The Definitive Guide: Big Data Processing Made Simple
– Learning Spark: Lightning-Fast Big Data Analysis
– High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
Machine Learning (MLlib)
Great documentation can be found at: https://spark.apache.org/mllib/
Certifications or trainings:
– Microsoft – https://docs.microsoft.com/en-us/azure/databricks/getting-started/training-faq
– Databricks – great way to get yourself certified: https://academy.databricks.com/category/certifications
– Amazon – https://databricks.com/p/webinar/aws-databricks-training-series
Certification is also a good way to get to know with the product and features Databricks certifications are fun!
There are also many online courses one should check and also great courses from many training companies.
As always, complete set of code and the Notebook is available at the Github repository.
Happy Coding and Stay Healthy! And Happy New year 2021! Wish you all the best!