登录查看更多内容

HOOK vs Data Vault: Willibald Part 4

Andrew Foad

发布日期: 2023年8月7日

+ 关注

As a reminder, the diagram below represents the Willibald source database. It consists of eleven tables.

This time we will be looking at how Willibald deals with Orders and the sale of their products.

In the first article in this series, you might remember that I talked about tweaking the original database design and making the OrderID column on both the Order and RoadshowOrder tables an auto-incrementing identity key. The idea is that there will be overlapping business keys across the two tables that refer to different orders. We must be able to distinguish between them.

Orders

Orders are the mechanism by which Willibald sells its products to its customers. There are two channels through which Willibald sell products: through an online web store; or direct to the public on their live Roadshow events. Each channel has its own tables in the Willibald system, highlighted in the model snippet below.

The RoadshowOrder table references Customer, Club and Product tables. The Order table only refers to Customer, and the OrderLine table hangs from it whilst referring to the Product table.

Which Core Business Concepts (CBCs) do we need to worry about? We already know about Club, Customer and Product, and now we have a new one called Order. We might ask whether we model Order as a single CBC or need two types: one for Web Store Order; and one for Roadshow orders. Well, if you even need to ask that question, I would suggest you go and read John Giles’ excellent The Elephant in the Fridge, where he describes some real-life examples that defy belief. In case you were wondering, it’s one CBC!

But what about OrderLine? The ELM approach would most definitely identify order lines as a distinct CBC; Data Vault, probably not; HOOK, almost certainly not.

Let’s start with Data Vault…

Data Vault

We know we have four CBCs and, therefore, four hubs. We have three source tables that we will need to map to some satellites. As there are multiple relationships, we also know we must implement some links.

The most straightforward modelling approach (at least, I think it is) is creating a satellite for each source table. If the satellite is associated with a single CBC, we hang it from the relevant hub; if the satellite is associated with more than one CBC, then we hang it from a suitably defined link — something like this.

This diagram represents just one solution, but there could be many others.

With four hubs, there are (a minimum of) eleven possible link structures. For example, should we model each binary relationship from the source as distinct links? We could debate the merits of each potential solution, but that would get boring really quickly.

And for the two Order tables, should we combine them into a single satellite? That might not be a great idea, but I’m sure somebody could make a good argument to do so.

The point is that there are numerous solutions and, therefore, many opportunities to get things wrong.

Ensemble Modelling (ELM)

As we’ve seen, the Ensemble modelling approach will yield a different structure as link satellites are not permitted. For simplicity, I’ve decided not to model any historical tracking of relationships:

Capita 2 年前

Developing The Community Data System – From Basic To…

Richard Millington 2 年前

Big Data is for everyone – Part 2 of 2

Naveen Joshi 8 年前

As with the Data Vault solution, we might have fashioned many alternatives, but I have tried to show that both Web and Roadshow Orders are contained within the same Order Ensemble.

There are four other ensembles. We’ve already covered three of them, Customer, Club and Product, in previous articles; the fourth is Order Line.

HOOK

Let’s see what the equivalent HOOK model looks like.

Here we follow the same pattern as before, with a Bag defined for each source table. As always, it is a one-to-one. As with previous models, we can define the Bags as views over a data lake table (we can build them as physical tables if we choose to). They might look like this.

CREATE VIEW bag.Order
AS
SELECT HK_ORDER??? = CONCAT(‘ORD.WEB|’, OrderID)?
??? ?, HK_CUSTOMER = CONCAT(‘CUST|’, CustomerID)
?? ??, *
?FROM lake.Order

And for the related order line information.

CREATE VIEW bag.OrderLine
AS
SELECT HK_ORDER?  = CONCAT(‘ORD.WEB|’, OrderID)?
? ?  , HK_PRODUCT = CONCAT(‘PROD|’, ProductID)
? ?  , *
?FROM lake.Order

The Roadshow orders would look like this.

CREATE VIEW bag.RoadshowOrder
AS
SELECT HK_ORDER???= CONCAT(‘ORD.RS|’, OrderID)?
????, HK_CUSTOMER = CONCAT(‘CUST|’, CustomerID)
????, HK_CLUB????= CONCAT(‘CLUB|’, ClubID)
????, HK_PRODUCT?= CONCAT(‘PROD|’, ProductID)
????, *
?FROM lake.RoadshowOrder

You may notice that the Hook Keys for Orders have different KeySets. For the Web Store Orders the business keys are prefixed with “ORD.WEB”; we prefix the Roadshow Orders with “ORD.RS”. Doing this ensures we don’t confuse the different types of orders. If, for example, we tried to perform a join between the Order and RoadshowOrder tables using the HK_ORDER hook key, the query would come up empty, which is precisely the behaviour we want. Whereas, if we tried to join using the OrderID field, there could be overlapping records, and the resulting query would return incorrect results.

Of course, we can do something similar with Data Vault by using the Business Key Collision Code or BKCC (I’ve said it before, it is terrible name!). In fact, I based the idea of HOOK KeySets on the BKCC. However, in Data Vault, we use the BKCC when convenient. In HOOK, KeySets are a mandatory component of the architecture. Every business key MUST be qualified with a KeySet.

Summary

As in the last article, I haven’t explored many options, preferring to stick to patterns I have used before. Does the HOOK approach deliver a simpler model?

I’ll let you decide. Here are the three versions of the Willibald Order model side by side:

Next Up

In Part 5, we will look at the Willibald Deliveries, which is an interesting part of the model, particularly from a HOOK modelling perspective, because we do have a decision to make on this occasion. Is a Delivery a CBC or not? The resulting HOOK model will be slightly different depending on the answer to that question. I won’t spend too much time looking at DV and ELM alternatives, as I’ve probably flogged that point to death. I will show that the modelling in HOOK is simple regardless of whether we consider Delivery a CBC, or not.

In Part 6, I will complete the picture and show versions of completed models for all three approaches (Data Vault, Ensemble and HOOK).

After that, I will present a simple physical implementation of a HOOK solution for the Willibald case study.?

Andreas Henning

CTO Bulbul AB

1 年

One thing that I think is odd. The propsed dv solution is probably fine if you ask the dv community. But the question regarding wich links to be created, I only see one solution if you dont want to lose any data regarding the relations between the hubs. Can you guess how many links and what links I would have created?

Rodrigo Bocanegra Cruz

Enterprise & Data Architect

1 年

Hi Andrew, just playing around with GPT and the Mermaid plugin to create a model from the scripts you provided. Very simple and straightforward. There are, of course, simpler ways to do this, like creating the scripts from metadata and getting the model from there, but you might get the idea. This means we can have a proper conceptual model maintained with the business and then complement it with this physical model. I think it is more suitable to compare the models with DataVault this way since the relationships and entities are set even if they are not persisted out of the views, but we are modelling anyway, aren't we? Cheers! https://chat.openai.com/share/c5e2b180-707a-4891-b3e6-f878dfa54953

查看更多评论

要查看或添加评论，请登录

查看全部

HOOK vs Data Vault: Willibald Part 4

Andrew Foad

Orders

Data Vault

Ensemble Modelling (ELM)

领英推荐

HOOK

Summary

Next Up

更多精彩文章

社区洞察

其他会员也浏览了

Spring Data Series 2

Top-down Data Quality Approach with Conformed Dimensions

Advanced Data Flow in SwiftUI: Beyond @State and @Binding

Why Accessing and Combining Data is a Challenge for Councillors and Data Teams – And How Active Maps Makes It Simple

Deleting Sensitive Data in the Data Lake (and beyond)

The Working of Knowledge Graphs, How Nodes & Edges Weave the Web of Data!

How to Grow a Data Domain from Scratch ??

The Data Insights People's News - March 2024

Disable Foreign Key Checks in ClickHouse for Faster Data Loading

How Big Data Impacts Decisions and Predictions

Orders

Data Vault

Ensemble Modelling (ELM)

领英推荐

HOOK

Summary

Next Up

Introducing HOOK. Part 1 - Logical Architecture

2024年1月21日

Is “Agile Data” an Oxymoron?

2023年10月15日

Always try to boil the ocean!

2023年10月8日

HOOK and the Unified Star Schema

2023年10月3日

HOOK vs Data Vault: Willibald Part 6

2023年9月17日

HOOK vs Data Vault: Willibald Part 5

2023年9月8日

The Attempted Assassination of ELM

2023年8月28日

HOOK vs Data Vault: Willibald Part 3

2023年7月24日

HOOK vs Data Vault: Willibald Part 2

2023年7月17日

HOOK vs Data Vault: Willibald Part 1

2023年7月10日

社区洞察

其他会员也浏览了

Spring Data Series 2

Top-down Data Quality Approach with Conformed Dimensions

Advanced Data Flow in SwiftUI: Beyond @State and @Binding

Why Accessing and Combining Data is a Challenge for Councillors and Data Teams – And How Active Maps Makes It Simple

Deleting Sensitive Data in the Data Lake (and beyond)

The Working of Knowledge Graphs, How Nodes & Edges Weave the Web of Data!

How to Grow a Data Domain from Scratch ??

The Data Insights People's News - March 2024

Disable Foreign Key Checks in ClickHouse for Faster Data Loading

How Big Data Impacts Decisions and Predictions