Azure Databricks is a data analytics platform (PaaS), specially optimised for Microsoft Azure cloud platform. Databricks is an enterprise-grade platform service that is unified for data lake architecture for large analytical operations.
Azure Databricks combines:
- large scale data processing for batch loads and streaming data
- simplifies and accelerates collaborative work among data scientists, data engineers and machine learning engineers
- offers complete analytics and machine learning algorithms and languages
- features complete ML DevOps model life-cycle; from experimentation to production
- is build on Apache Spark and embraces Delta Lake and ML Flow
Azure Databricks is optimized for the Microsoft Azure and offeres interactive workspace for collaboration between data engineers, data scientists, and machine learning engineers. With the multi language capabilities to create notebooks in Python, R, Scala, Spark, SQL and others.
It gives you the capabilities also to run SQL queries on data lake, create multiple visualisation types to explore query results from different perspectives, and build and share dashboards.
Azure Databricks is designed to build and handle big data pipeline, for data ingestion (raw or structured) into Azure through several different Azure services as:
- Azure Data Factory in batches,
- or streamed near real-time using Apache Kafka,
- Event Hub, or
- IoT Hub.
If supports also connectivity so several persisted storages for creating data lake, like:
- Azure Blob Storage
- Azure Data Lage Storage
- SQL-type databases
- Queue / File-tables
Your analytics workflow will be using Spark technology to read data from multiple different sources, and create state of the art analytics in Azure Databricks.
Welcome page to Azure Databricks gives you easy, fast and collaborative interface.
Complete set of code and Notebooks will be available at the Github repository.
Happy Coding and Stay Healthy!