Snowflakes and why not to use them (4/4): Conclusion

Snowflakes and why not to use them (4/4): Conclusion

In the previous articles, I have introduced simple snowflake utilizing SCD2 model. I have also build a simple star schema model and tried to compare those both. Here is the comparison:

Snowflake will save you space compared to star schema 

Snowflake is more normalized than star schema and will lead to lower data redundancy. This means you will need less storage space in case you use snowflake. This can get significant if you want to use SCD2 and track version history.

Star schema is easier to load than snowflake schema 

Snowflake is more complex (by its definition there will be more tables) and this means that the ETL loads will be more complex as well.

  • Firstly you will need to load in the right order
  • Secondly, you will need to do some key lookups and calculate the hashes on the fly

So is it worth using snowflake? In my opinion, it is not. From my experience, BI developers should strive to keep the ETL loads as simple as possible even if it means that the data they load will take more space. Also, the space issue can be mitigated by compression. If the table contains a lot of repetitive values I believe that a compression algorithm can achieve some very good results. There are also two other important reasons why not to use snowflake (I have not discussed those in detail in these texts):

  • I suspect that snowflakes can confuse some of the reporting tools that will display your data. I don't really have any proof of this, but since star schema is simpler I would suggest it would be easier to digest by the reporting suites.

Fig 8: "Hello Master, please use star schema otherwise I get confused when preparing your reports."

  • What I am sure about, is that star schema will be easier for your information workers. If you are building a reporting solution for proper data analysis then the users will have to understand the underlying data schema (unless the reporting tool offers some level of abstraction). And yet again, a star schema is simpler and people from the business don't want to be bothered by some complex data structures.

In conclusion, I would go for star schema even though it is not very storage-efficient, but there are many other advantages. If you really need to save space just compress the tables. How do you see the problem?

More sources about snowflakes...

Previous articles:

My previous technical articles:


No-brainer. Non-snowflaked star schema all the way for me. Although in reality it is rare to conform a data model exactly to this. I have found there is usually a trade off or two to be made. A typical recurring example is dates on a dimension where you might want to join to a snowflaked date dimension for time analysis. Repetition doesn't matter. Storage doesn't matter. A simple model that is easy to digest (by cubes, self-serve tools, users) and produces simpler, more efficient queries is the over-riding factor in any design.

要查看或添加评论,请登录

Petr Podrouzek的更多文章

社区洞察

其他会员也浏览了