Advent of 2020, Day 1 – What is Azure DataBricks

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Azure Databricks is a data analytics platform (PaaS), specially optimised for Microsoft Azure cloud platform. Databricks is an enterprise-grade platform service that is unified for data lake architecture for large analytical operations.

Azure Databricks: End-to-end web-based analytics platform

Azure Databricks combines:

  • large scale data processing for batch loads and streaming data
  • simplifies and accelerates collaborative work among data scientists, data engineers and machine learning engineers
  • offers complete analytics and machine learning algorithms and languages
  • features complete ML DevOps model life-cycle; from experimentation to production
  • is build on Apache Spark and embraces Delta Lake and ML Flow

Azure Databricks is optimized for the Microsoft Azure and offeres interactive workspace for collaboration between data engineers, data scientists, and machine learning engineers. With the multi language capabilities to create notebooks in Python, R, Scala, Spark, SQL and others.

It gives you the capabilities also to run SQL queries on data lake, create multiple visualisation types to explore query results from different perspectives, and build and share dashboards.

Azure Databricks is designed to build and handle big data pipeline, for data ingestion (raw or structured) into Azure through several different Azure services as:

  • Azure Data Factory in batches,
  • or streamed near real-time using Apache Kafka,
  • Event Hub, or
  • IoT Hub.

If supports also connectivity so several persisted storages for creating data lake, like:

  • Azure Blob Storage
  • Azure Data Lage Storage
  • SQL-type databases
  • Queue / File-tables

Your analytics workflow will be using Spark technology to read data from multiple different sources, and create state of the art analytics in Azure Databricks.

Welcome page to Azure Databricks gives you easy, fast and collaborative interface.

Complete set of code and Notebooks will be available at the Github repository.

Happy Coding and Stay Healthy!

To leave a comment for the author, please follow the link and comment on their blog: R – TomazTsql.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)