HOOK vs Data Vault: Willibald Part 5

HOOK vs Data Vault: Willibald Part 5

As a reminder, the diagram below represents the Willibald source database. It consists of eleven tables.

This time we will be looking at how Willibald tracks deliveries of Web Orders. ?

Problems with the Source Model

Before we get into the details of modelling this data, we should spend some time looking at the source data. There are issues, and I suspect that the DDVUG folks who pulled the Williband case study together deliberately designed the database this way.

I’ve highlighted the part of the model we’re interested in below.

Web Orders consist of a header recording customer information and then any number of product details which list each product included in the order. You might think that Delivery would follow a similar pattern, but rather than a header/detail structure, the Willibald database represents deliveries as a single table with a primary key consisting of four fields.

If you look closely at the primary key, you’ll see that the database will allow multiple records for each order line. In other words, we can assign the same order line to multiple delivery addresses and couriers simultaneously, which seems logically incorrect.

However, we could send products for each order line in batches. For example, if the customer orders 100 packets of tomato seeds but there are only 30 packets in stock, they could be sent out immediately, with a second batch of 70 following later once the new stock arrives. Unfortunately, the database doesn’t allow for the quantity of product in each batch, so although we know what was delivered, we can’t say how much was delivered.

Ultimately, we need to talk to the business to determine the correct business process, which will help us figure out the appropriate business key. Let’s assume that Delivery is a Core Business Concept (CBC) and see where that leads us.

ELM

For this article, I will break the convention of presenting a Data Vault model before the ELM version. This is because, in this case, the ELM model is easy to determine. When I say easy, I mean that working out the correct model without ambiguity is easy. You will see (in the following section) that the same is not true for the Data Vault model.

As we have already stated, a Delivery is a CBC, so ELM requires us to define a Delivery Hub.

The model for the Delivery data is straightforward. We know we have references to Order, Courier and Delivery, so we need a Link that associates the three CBCs. We hang the Delivery satellite from the Delivery Hub, and we are done.

Data Vault

You would think that the Data Vault model would be simpler, but we run into a problem straight away if you follow the advice from the “Data Vault Corporation” that I discussed in an earlier article (The Attempted Assassination of ELM).

If we agree that Delivery is a Core Business Concept (CBC), what is its business key? The database model suggests that it should be: {OrderID, OrderLineNo, DeliveryAddressID, CourierID}. A delivery does not have its own identifying business key; it is derived from Courier and Order business keys. I think that Moderator1 (as I refer to them in the aforementioned article) referred to this as a “fake” business key. Does that mean that Delivery is NOT a CBC?

If we accept that Delivery, Courier?and Order are all recognised CBCs, the Data Vault model looks like this:

However, if Delivery is not a CBC, the model looks like this:

Which version is correct? I don’t know. Delivery feels like it should be a CBC, but as it doesn’t seem to have a true business key, but rather a composed key derived from other CBCs, perhaps it isn’t a CBC. There is ambiguity as to which solution is correct. Either solution is valid, depending upon your modelling preference. I would be curious to know what advice the “Corporation” would give in this scenario.

HOOK

You may recall that to create the Data Vault and ELM models, we decided to treat Delivery as a CBC. If, for whatever reason, we had decided otherwise, the models would have looked different. Furthermore, if we change our minds, we would have to re-engineer the model.

With HOOK, we don’t need to be concerned about this change. To illustrate, I will walk through how easily we can absorb such changes. I will focus on the Delivery data only.

Let’s assume that our starting point states that Delivery is NOT a CBC. We’ll also implement our HOOK bags as views

CREATE VIEW Bag.Delivery
AS
SELECT HK_ORDER?? = CONCAT(‘ORD.WEB|’, OrderID)
???? , HK_COURIER = CONCAT(‘COUR|’, CourierID)
???? , *
? FROM lake.Delivery        

It’s pretty straightforward and easy to implement. Let’s now say that after a period of time, we probably should have considered Delivery to be a CBC. We can implement the change as easily as this;

CREATE OR REPLACE VIEW Bag.Delivery
AS
SELECT HK_ORDER?? ?= CONCAT( ‘ORD.WEB|’, OrderID)
???? , HK_COURIER ?= CONCAT( ‘COUR|’, CourierID)
???? , HK_DELIVERY = CONCAT( ‘DLVRY|’ , OrderID, ‘|’ , OrderLineNo, ‘|’
?????????????????????????? , DeliveryAddressID, ‘|’, CourierID)
???? , *
? FROM lake.Delivery        

The only issue here is what values we include in the definition of the Delivery business key. As discussed earlier, the primary key for the source Delivery table is in question. This uncertainty further highlights the benefits of the HOOK approach. If we change our minds, then we can change the definition with minimal impact. The structure of the model remains unchanged. We don’t need to add or remove any tables. We don’t need to restructure or reload any data. The HOOK approach gives a great deal of flexibility and agility.

Summary

Again, I hope that I have been able to demonstrate how HOOK tackles ambiguity by providing a simple yet flexible approach. Agility is baked into the methodology, and the simple one-to-one mapping of source to warehouse table means that we don’t need to think about how to structure the data, only which business keys (concepts) we care about and want to expose.

There really isn’t any more I can say about the Willibald model, but in the last part (Part 6), I will show the complete HOOK model and how we can easily summarise the model in a simple BUS matrix.

Michael Mueller

Spezialist Data Vault und Datenarchitekturen im Unternehmen

1 年

Delivery has indeed no own key on purpose. There are many systems out there, who just use the combined foreign keys as a primary key for m:n Tables. SAP is the most famous of them. This is not bad modelling or stupid by design. It follows the principles of structured entity relationship modelling by Prof. Sinz. He just thought if we assign a surrogate key to a table, we will loose the info that this a dependent entity to several other entities. This is a bit technical. But he gets the independent Entities quite clearly. It delivery a CBC? I usually go with 'if it has attributes it has some meaning to business'. The question whether there is only 1 delivery per order line or if there are multiple deliveries possible, is answered in the description of willibald (https://dwa-compare.info/en/willibalds-data/). However, if you have deliveries as a business concept you will have a much easier live, if they add the functionality of having more than 1 delivery per order line in the future. My data vault model would always look the same here, ELM or not. As the Delivery has an identifying relationship, there is no need for a link satelite. But I will always need a satelite for the delivery, just in case of deletions in the delivery.

Remco Broekmans

VP International Programs at Genesee Academy, LLC

1 年

Of course I feel the need to react . You are absolutely right that Delivery should be identified as it’s own CBC - it is the event, and your ELM model is perfectly fine. It will also translate directly towards a DataVault model. As you state the issue is that there is no clear business key readily available which is an issue. But instead of adjusting your model it is better and recommended to create your own business key here. Worst case scenario it will consist of all the 4 parts (only use each business key instead of technical key). I know that sounds crazy but it will make you source independent and true to the business- imagine if you go with your final DV solution and a source will come in time providing a proper Delivery business key…… If you alredy modeled according to the business there is no issue, no remodeling, no re-engineering. Don’t be afraid and create your own business keys if it’s needed to stay true to the business, don’t adjust your model because your sources suck! I like the elegance of Hook in these cases btw.

要查看或添加评论,请登录

Andrew Foad的更多文章

  • Is Data Modelling Dead?

    Is Data Modelling Dead?

    I’ve been using data modelling techniques for most of my 35-year career, and I consider it an essential skill for…

    37 条评论
  • Data Vault is Garbage (change my mind)

    Data Vault is Garbage (change my mind)

    If you are unfamiliar with the Hans Christian Andersen story of “The Emperor’s New Clothes”, ChatGPT summarises it…

    7 条评论
  • Introducing HOOK. Part 1 - Logical Architecture

    Introducing HOOK. Part 1 - Logical Architecture

    This article is the first in a short series on the Hook approach to Data Warehousing. As I said in my introduction…

  • Is “Agile Data” an Oxymoron?

    Is “Agile Data” an Oxymoron?

    In last week’s article, I discussed how you can “boil the ocean” when building a HOOK data warehouse. The article…

    11 条评论
  • Always try to boil the ocean!

    Always try to boil the ocean!

    Recently, I have felt very left out by all these Snowflake folk with their superhero avatars. As I don’t work for…

    16 条评论
  • HOOK and the Unified Star Schema

    HOOK and the Unified Star Schema

    First, we should probably understand what a Unified Start Schema is before we start talking about how we can use it in…

    6 条评论
  • HOOK vs Data Vault: Willibald Part 6

    HOOK vs Data Vault: Willibald Part 6

    As a reminder, the diagram below represents the Willibald source database. It consists of eleven tables.

  • The Attempted Assassination of ELM

    The Attempted Assassination of ELM

    Around the middle of the 19th Century, the students of a public boy’s school in the English Midlands invented a game. A…

    13 条评论
  • HOOK vs Data Vault: Willibald Part 4

    HOOK vs Data Vault: Willibald Part 4

    As a reminder, the diagram below represents the Willibald source database. It consists of eleven tables.

    5 条评论
  • HOOK vs Data Vault: Willibald Part 3

    HOOK vs Data Vault: Willibald Part 3

    As a reminder, the diagram below represents the Willibald source database. It consists of eleven tables.

    5 条评论

社区洞察

其他会员也浏览了