Dimension Tables And Change

Nigel Shaw

Creating A Shared Language Of Data

发布日期: 2023年5月16日

+ 关注

Designing dimension tables for data warehouses is one of the most interesting challenges for a data modeler.

Assigning the attributes to the dimension is relatively simple - you know the grain of the data and you know what attributes are available at that grain.

Next is the decision about SCD (Slowly Changing Dimensions). How do we want to handle change for the dimension. Is the data static or do we need to manage change?

There are 6 SCD types.

0 - static data (ignore any changes)

1 - overwrite data (no history is retained)

2 - create new record (history is retained)

3 - history is a new column

4 - a new dimension is added

6 - combination of type 2 and type 3

The most common types used are type 1 and type 2. I have used type 4 a few times, but it was only when it was not performance intensive - I used a trigger to populate the history table. In 40 years I only used type 3 once.

Now that we have the fields it is important to look at the rate of change at the attribute level. If some fields change often and some fields rarely change we may consider switching them into a fast and a slow table. We don't want to be carrying 20 fields where one field changes often and 19 fields rarely change. Then each record will have identical values for 19 fields, swelling our database.

Most people use a simple CDC (change data capture) strategy to detect change. We can then follow the logic required to update the data. A common technique is to concatenate the fields of interest with a separator (like a pipe column) and then Hash8 that value. What you get is the knowledge that at least one field changed value, but you get no visibility to which field.

Some modelers will lean towards 6NF data structures. This provides the ability to track change at the attribute level. But, it also produces significant complexity for loading the data and it is painful for any SQL Developer. As much as I love 6NF, I would not recommend it as a first choice. The complexity outweighs most of the benefits.

For SCD type 2, you will use two dates; start date and end date. You could use NULL for the end date for the most recent record, but a lot of people will use a date far into the future (12/31/2099). It makes SQL queries easier.

要查看或添加评论，请登录

Nigel Shaw的更多文章

Integer Bitmaps - Data Modeling

2024年8月18日

Integer Bitmaps - Data Modeling

One of the greatest qualities in data modeling is elegance - finding a solution that is beautiful and simple. Integer…

2 条评论
Adventures In Data

2024年7月11日

Adventures In Data

I actually think it is harder now to be a data analyst - there are so many tools and possibilities. It must hard to…
Being A Great IT Manager

2024年7月8日

Being A Great IT Manager

You were selected to lead. Just being a manager barely meets any job requirement.
Data Quality - The Elephant In The Room

2024年6月21日

Data Quality - The Elephant In The Room

I think there is a general reticence to speak the truth; data quality issues are ultimately defects, and what we should…
No Room For Heroes

2024年6月19日

No Room For Heroes

When you work with data you need a real team with both depth and width. There is often a tendency to make your team…
Data Undressed

2024年6月14日

Data Undressed

Data at the source is wrapped in layers of metadata; definition, lineage, value, security and on and on. Once we start…
Noor Inayat Khan

2024年6月11日

Noor Inayat Khan

If you ever visit Dachau, you will see a simple memorial plaque in the Memorial Hall for Noor Inayat Khan. She is one…
The Eternal Battle - Data vs App

2024年6月11日

The Eternal Battle - Data vs App

If you look at the root of most data quality issues it is normally embedded within apps - either poor data design (how…

1 条评论
Dashboard Trust Transparency

2024年6月8日

Dashboard Trust Transparency

When you publish a dashboard to your stakeholders you are making a commitment to them to provide timely and accurate…
Domains, Lists And Data

2024年6月3日

Domains, Lists And Data

Mastering data is all about mastering the basics, and there is no better place to start than domains. A domain is…

See all articles

Nigel Shaw的更多文章

Integer Bitmaps - Data Modeling

Adventures In Data

Being A Great IT Manager

Data Quality - The Elephant In The Room

No Room For Heroes

Data Undressed

Noor Inayat Khan

The Eternal Battle - Data vs App

Dashboard Trust Transparency

Domains, Lists And Data

社区洞察