An Overview of Azure Databricks
Databricks is an open & unified data analytics platform for data engineering, data science, and machine learning.
It is a simple (Unify your data warehousing and AI use cases on a single platform), open (Build on open source and open standards) and multicloud supporting platform.
Databricks Components :
Runtime is the set of software artifacts that run on the clusters of machines managed by Databricks. A workspace is an environment for accessing all of your Azure Databricks assets.?
Azure Databricks:
Azure Databricks is Built to Seamlessly Integrate with Azure Data Stores and Services.
The core of the Azure Databricks architecture is a Databricks runtime engine, it has optimized Spark offering, Delta Lake, and BDIO (Databricks I/O) for Optimized Data Access Layer engine. This core engine offers massive processing power for data science workloads. It also provides native integration capabilities with different Azure data services, such as Azure Data Factory and Synapse Analytics. It also offers various ML runtime environments, such as?Tensorflow?&?PyTorch. The notebooks can be integrated with the MLFlow +?Azure ML Service.
Azure Data Lakehouse with Databricks:
Typical Enterprise Data Lake: (Image source- hitconsultant.net)
Modern Enterprise Data Lakehouse with Azure Databricks:
Modern analytics architecture with Azure Databricks:
Reference:
[1] https://docs.microsoft.com/en-us/azure/databricks/getting-started/concepts
[2] https://industry40.co.in/azure-hdinsight-and-azure-databricks/
[3] https://www.mssqltips.com/sqlservertip/7037/azure-data-lakehouse-ingestion-processing-options/
[4] https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/azure-databricks-modern-analytics-architecture