Data Vault Modeling

Data Vault Modeling

Data Vault Modeling Article

The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. (The formal definition as written by the inventor Dan Linstedt)

It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent, and adaptable to the needs of the enterprise. It is a data model that is architected specifically to meet the needs of today’s enterprise data warehouses.

The main point here is that Data Vault (DV) was developed specifically to address agility, flexibility, and scalability issues found in the other main stream data modeling approaches used in the data warehousing space. It was built to be a granular, non-volatile, auditable, historical repository of enterprise data.

At its core is a repeatable modeling technique that consists of just three main types of tables:

  • Hubs = Unique list of Business Keys
  • Links = Unique List of Associations / Transactions
  • Satellites = Descriptive Data for Hubs and Links (Type 2 with history)

Hubs make it business driven and allow for semantic integration across systems.

Links give you the flexibility to absorb structural and business rule changes without re-engineering (and therefore without reloading any data).

Satellites give you the adaptability to record history at any interval you want plus unquestionable auditability and traceability to your source systems.

Here is a simple example of what at Data Vault 2.0 model looks like:

Reference Article https://www.snowflake.net/blog/ability-to-connect-to-snowflake-with-jdbc/

Data Valut Reference materials

 

Anand R Rao

Product Engineering Leader- Data & Information Management | Data Engineering | BI Analytics| Data Science - AI/ML/NLP | Big Data | Data & Cloud Architecture Solutions | Leading SAFe? | Data Governance

9 年

Great Article Murali. In fact, we experimented with Data Vaults, it is really useful when you want to advance / extend data warehouse for future BI Analytics. It would be really great if you can throw some insights on 'What next after Data Vault 2.0" and how we can extend it further for future implementation ...

回复

要查看或添加评论,请登录

Murali Krishna Vysyaraju (TOGAF Certified)的更多文章

  • The 7 Steps of a Data Project

    The 7 Steps of a Data Project

    Becoming data driven is about this: knowing the basic steps and following them to go from raw data to building a…

    2 条评论
  • What Is the “Thing” in the IoT?

    What Is the “Thing” in the IoT?

    Everyone talks about the Internet of Things. And sure, you know what the Internet is (you’re soaking in it!).

  • Cloud Platform Comparison

    Cloud Platform Comparison

    Please refer the below url for complete information - https://endjincdn.blob.

  • Data Lake VS Data Warehouse

    Data Lake VS Data Warehouse

    Which Should You Choose? A core component of business intelligence, the data warehouse is a central repository of…

    1 条评论
  • Apache Spark vs. Apache Drill

    Apache Spark vs. Apache Drill

    There are some similarities between the two projects. Apache Drill and Apache Spark are both distributed computation…

  • Internet of Things VS Internet

    Internet of Things VS Internet

  • Azure Event Hub and Kafka

    Azure Event Hub and Kafka

    Any organization/ architect/ technology decision maker that wants to set up a massively scalable distributed event…

    1 条评论
  • Hadoop and the Data Warehouse: When to Use Which

    Hadoop and the Data Warehouse: When to Use Which

    Hadoop and the data warehouse will often work togehter in a single information supply chain. When it comes to Big data,…

    6 条评论
  • SQL Server database migration to SQL Database in the cloud

    SQL Server database migration to SQL Database in the cloud

    In this article you learn to how to migrate an on-premises SQL Server 2005 or later database to Azure SQL Database. In…

  • Spring XD: The Foundation for Real-time Streaming and Machine Learning Systems

    Spring XD: The Foundation for Real-time Streaming and Machine Learning Systems

    Spring XD addresses the new demands of big data and real-time data pipelining, but it sets a foundation for much more…

社区洞察

其他会员也浏览了