登录查看更多内容

Snowflakes and why not to use them (4/4): Conclusion

Petr Podrouzek

?? Global Tech Leader | ?? SVP at Emplifi | ?? Strategy & Engineering Excellence

发布日期: 2017年3月5日

In the previous articles, I have introduced simple snowflake utilizing SCD2 model. I have also build a simple star schema model and tried to compare those both. Here is the comparison:

Snowflake will save you space compared to star schema

Snowflake is more normalized than star schema and will lead to lower data redundancy. This means you will need less storage space in case you use snowflake. This can get significant if you want to use SCD2 and track version history.

Star schema is easier to load than snowflake schema

Snowflake is more complex (by its definition there will be more tables) and this means that the ETL loads will be more complex as well.

Firstly you will need to load in the right order
Secondly, you will need to do some key lookups and calculate the hashes on the fly

So is it worth using snowflake? In my opinion, it is not. From my experience, BI developers should strive to keep the ETL loads as simple as possible even if it means that the data they load will take more space. Also, the space issue can be mitigated by compression. If the table contains a lot of repetitive values I believe that a compression algorithm can achieve some very good results. There are also two other important reasons why not to use snowflake (I have not discussed those in detail in these texts):

I suspect that snowflakes can confuse some of the reporting tools that will display your data. I don't really have any proof of this, but since star schema is simpler I would suggest it would be easier to digest by the reporting suites.

Fig 8: "Hello Master, please use star schema otherwise I get confused when preparing your reports."

What I am sure about, is that star schema will be easier for your information workers. If you are building a reporting solution for proper data analysis then the users will have to understand the underlying data schema (unless the reporting tool offers some level of abstraction). And yet again, a star schema is simpler and people from the business don't want to be bothered by some complex data structures.

In conclusion, I would go for star schema even though it is not very storage-efficient, but there are many other advantages. If you really need to save space just compress the tables. How do you see the problem?

More sources about snowflakes...

Mathematical view of snowflakes
Quick comparison of snowflakes and star schemas

My previous technical articles:

??Lee Bennett

7 年

No-brainer. Non-snowflaked star schema all the way for me. Although in reality it is rare to conform a data model exactly to this. I have found there is usually a trade off or two to be made. A typical recurring example is dates on a dimension where you might want to join to a snowflaked date dimension for time analysis. Repetition doesn't matter. Storage doesn't matter. A simple model that is easy to digest (by cubes, self-serve tools, users) and produces simpler, more efficient queries is the over-riding factor in any design.

1 次回应

要查看或添加评论，请登录

Petr Podrouzek的更多文章

Innovating with Emplifi Unified Analytics ??

2023年12月4日

Innovating with Emplifi Unified Analytics ??

Last week marked a significant milestone for Emplifi with the launch of Unified Analytics ??. This feature symbolizes…

3 条评论
My goals for 2022

2022年1月31日

My goals for 2022

My goal for January was to come up with what I would like to achieve in 2022 professionally. I asked myself how can I…

2 条评论
Top 3 things I did in 2021

2022年1月3日

Top 3 things I did in 2021

I truly believe it is important to reflect on past experiences and learnings. Sometimes we are so preoccupied with the…

3 条评论
Data warehouse release nightmares (2/2)

2017年11月5日

Data warehouse release nightmares (2/2)

In the previous text, I have discussed the issues I have encountered when releasing DWH based on my 10 years of…
Data warehouse release nightmares (1/2)

2017年10月1日

Data warehouse release nightmares (1/2)

I have been a BI/DWH developer for nearly 10 years now and most of the projects I have worked on had one particular…

1 条评论
What value can MDM bring to organisations and at what cost? (3/3)

2017年7月3日

What value can MDM bring to organisations and at what cost? (3/3)

In the previous articles, I have discussed the benefits of implementing MDM but also the costs it can bring. Now let's…
What value can MDM bring to organisations and at what cost? (2/3)

2017年6月11日

What value can MDM bring to organisations and at what cost? (2/3)

In the previous article, I discussed the benefits of implementing MDM. As with any technology, it does come with a cost…
What value can MDM bring to organisations and at what cost? (1/3)

2017年5月8日

What value can MDM bring to organisations and at what cost? (1/3)

There are many applications supporting various processes in organizations - there is not a single process that would…

3 条评论
Snowflakes and why not to use them (3/4): The problem

2017年2月6日

Snowflakes and why not to use them (3/4): The problem

As stated in the previous article, snowflakes can be more efficient in using storage compared to stars. And believe me,…

2 条评论
Snowflakes and why not to use them (2/4): Implementation

2017年1月1日

Snowflakes and why not to use them (2/4): Implementation

In the previous text I have defined what snowflake and star schema is. I also explained the principle of slowly…

10 条评论

See all articles

Snowflakes and why not to use them (4/4): Conclusion

Petr Podrouzek

?? Global Tech Leader | ?? SVP at Emplifi | ?? Strategy & Engineering Excellence

Snowflake will save you space compared to star schema

Star schema is easier to load than snowflake schema

More sources about snowflakes...

Previous articles:

My previous technical articles:

Petr Podrouzek的更多文章

社区洞察

其他会员也浏览了

Exploring Semi-Structured Data in Snowflake: Streamlining Discovery and Schema Evolution

MDS Newsletter #31

Data Integration from Fabric Lakehouse to Snowflake Database using Data Pipeline

Pancake Power: How You Can Eat Any JSON Complexities on Snowflake for Breakfast?

Replace Your SSAS Capability in the Cloud with Modernized Analytics

Using DBT with Snowflake - The Basics

Revamp your data needs with Lyftrondata ANSI SQL Data Pipeline

What makes Snowflake platform so damn cool?

Stream & Merge for Incremental Loading in Snowflake

What is the "Task" in Snowflake? Explained!!

Snowflake will save you space compared to star schema

Star schema is easier to load than snowflake schema

More sources about snowflakes...

Previous articles:

My previous technical articles:

Petr Podrouzek的更多文章

Innovating with Emplifi Unified Analytics ??

My goals for 2022

Top 3 things I did in 2021

Data warehouse release nightmares (2/2)

Data warehouse release nightmares (1/2)

What value can MDM bring to organisations and at what cost? (3/3)

What value can MDM bring to organisations and at what cost? (2/3)

What value can MDM bring to organisations and at what cost? (1/3)

Snowflakes and why not to use them (3/4): The problem

Snowflakes and why not to use them (2/4): Implementation

社区洞察

其他会员也浏览了

Exploring Semi-Structured Data in Snowflake: Streamlining Discovery and Schema Evolution

MDS Newsletter #31

Data Integration from Fabric Lakehouse to Snowflake Database using Data Pipeline

Pancake Power: How You Can Eat Any JSON Complexities on Snowflake for Breakfast?

Replace Your SSAS Capability in the Cloud with Modernized Analytics

Using DBT with Snowflake - The Basics

Revamp your data needs with Lyftrondata ANSI SQL Data Pipeline

What makes Snowflake platform so damn cool?

Stream & Merge for Incremental Loading in Snowflake

What is the "Task" in Snowflake? Explained!!