Hey, can you pull this data for me?
Nicolai Ernst
I am a Data Engineer by profession & Lufthanseat by passion, who can‘t live without his camera.
Sure, let me just ...
Gold data assets refer to high-quality, reliable, and well-structured datasets that have been thoroughly validated and curated. Having access to such data assets is essential, as they build the foundation for data-driven decisions, predictive analytics, and efficient processes.
However, building gold data assets takes quite some time and involves a systematic process of
A common data design pattern is the medallion architecture to logically organize data in a lakehouse. As the data passes through each layer of the architecture, its structure, quality and maturity continuously improves.
The bronze layer serves as the initial landing point for data from external source systems, preserving the source system table structures "as-is" while incorporating metadata columns.
Within the silver layer of the lakehouse, data from the bronze layer undergoes matching, merging, conformation, and "just-enough" cleansing to enable an enterprise view of master data, such as customers and transactions.
Data in the gold layer of the lakehouse is typically modeled to be consumption-ready, e.g. by introducing
领英推荐
Introducing another layer ...
In my past projects, I decided to introduce another layer to the medallion architecture, to separate the implementation of hard and soft rules. Basically, I renamed the bronze layer to landing zone, and applied hard rules on the bronze layer and soft rules on the silver layer.
Here's the reason:
Hard rules do not alter the contents or the granularity of the data, thus maintaining auditability. Further, applying those rules rarely involves business departments, so that Data engineers can focus on core data engineering, such as
Soft rules change the data, e.g. by introducing business logic. As applying those rules require input from business analysts or subject-matter experts, this can be a time-consuming task until all requirements have been collected, also involving to ask the right questions. For example, you might want to
Here, I try to model business processes at the lowest granularity possible, often requiring collaboration across departments - leaving aggregation, filtering etc. up to the gold layer, based on the requirements of the business department or visualization technology, e.g. optimizing the table for process mining.
Over time, these assets will become valuable resources for informed decision-making and business growth. ??