Medallion Architecture Layers in Microsoft Fabric Lakehouse
Abiola A. David, MSc, MVP
??Microsoft? Fabric & Excel MVP [5X] | Senior Fabric, Databricks and Azure Solutions Architect | Power BI, SQL, Excel, GCP | MSc, Big Data & BI | DP700 & DP600 Certified | C# Corner MVP [2X]
In Microsoft Fabric, the Medallion Architecture Layers is a design pattern employed to logically organize data in a lakehouse. The architecture comprises three distinct layers (Bronze, Silver and Gold), each indicating the quality of data stored in the lakehouse, with higher levels representing higher quality. The multi-layered approach helps Fabric Engineers to build a single source of truth for enterprise data products.
Bronze Layer
All data for the lakehouse begins with the bronze layer of the medallion architecture. This layer stores data in its raw format, regardless of whether it is structured, semi-structured, or unstructured. No modifications are done to the data in this layer.
To get raw data into the bronze layer, engineers can leverage Data Factory Data Pipelines, Fabric Notebook, Databricks and Azure Data Lake Storage Gen2
领英推荐
Silver Layer
The silver layer of the medallion architecture is where fabric data engineers and users processes and refines their data which include performing operations such as appending, merging data and applying data validation rules like removing nulls and deduplication (removing redundant data). The silver layer is a central repository for Fabric-powered organizations to store their data in a consistent format and seamlessly share it with multiple teams. In the silver layer, data leaning is undertaken so that everything is in one place and ready to be modelled and analyzed in the gold layer.
In the silver layer of the medallion architecture, data cleaning typically involves any of the following steps:
To perform data cleaning in the silver layer of the medallion architecture, experienced users or Fabric engineers can leverage Fabric notebook to write code in PySpark, Scala, R, or SQL that performs data cleaning operations on the data in the bronze layer and saves the output in the silver layer. In addition, Data Factory data pipelines can be employed that to automate the data cleaning tasks on the data in the bronze layer and load the data into the silver layer.
Gold Layer
The gold layer of the medallion architecture is where users and Fabric engineers enrich their data with additional information and analysis. The layer allows engineers or users to aggregate data to a specific level of detail, such as daily or hourly, or add external data sources to their data. The gold layer of the medallion architecture is where you'll enrich your data with additional information and analysis. This layer allows you to aggregate data to a specific level of detail, such as daily or hourly, or add external data sources to your data. Once the data reaches the gold stage, it's ready for use by downstream teams, including analytics, data science, or Machine Learning operations