Data Vault Modeling
Murali Krishna Vysyaraju (TOGAF Certified)
Assistant Vice President - Genpact
The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. (The formal definition as written by the inventor Dan Linstedt)
It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.
The main point here is that Data Vault (DV) was developed specifically to address agility, flexibility, and scalability issues found in the other main stream data modeling approaches used in the data warehousing space. It was built to be a granular, non-volatile, auditable, historical repository of enterprise data.
At its core is a repeatable modeling technique that consists of just three main types of tables:
- Hubs = Unique list of Business Keys
- Links = Unique List of Associations / Transactions
- Satellites = Descriptive Data for Hubs and Links (Type 2 with history)
Hubs make it business driven and allow for semantic integration across systems.
Links give you the flexibility to absorb structural and business rule changes without re-engineering (and therefore without reloading any data).
Satellites give you the adaptability to record history at any interval you want plus unquestionable auditability and traceability to your source systems.
Here is a simple example of what at Data Vault 2.0 model looks like:
Reference Article https://www.snowflake.net/blog/ability-to-connect-to-snowflake-with-jdbc/
Data Valut Reference materials
Product Engineering Leader- Data & Information Management | Data Engineering | BI Analytics| Data Science - AI/ML/NLP | Big Data | Data & Cloud Architecture Solutions | Leading SAFe? | Data Governance
9 年Great Article Murali. In fact, we experimented with Data Vaults, it is really useful when you want to advance / extend data warehouse for future BI Analytics. It would be really great if you can throw some insights on 'What next after Data Vault 2.0" and how we can extend it further for future implementation ...