Data Modeling for Mere Mortals – Part 2: Dimensional Modeling Fundamentals
Nikola Ilic
I make music from the data??Data Mozart ??| MVP Data Platform | Pluralsight Author | O'Reilly Instructor | MCT
In the previous article , you learned the general principles of data modeling. Now, it’s time to narrow down our learning journey and master data modeling concepts and techniques that are specifically relevant to business intelligence scenarios. And, whenever we’re talking about data modeling for business intelligence, dimensional modeling is definitely the number one concept.
History Lessons – Ralph Kimball
Before we come up to explain why dimensional modeling is named like that – dimensional, let’s first take a brief tour through some history lessons. In 1996, a man called Ralph Kimball published a book “The Data Warehouse Toolkit”, which is still considered a dimensional modeling “Bible”. In his book, Kimball introduced a completely new approach to modeling data for analytical workloads, the so-called “bottom-up” approach. The focus is on identifying key business processes within the organization and modeling these first, before introducing additional business processes.
4-Step Dimensional Design Process
Kimball’s approach is fairly simple and consists of 4 steps, also known as the “4-Step Dimensional Design process”. Each step is based on a decision.
Why Dimensional Modeling?
Before we move on to explain different physical implementations of dimensional modeling, let’s quickly reiterate the key benefits of building a dimensional model for business intelligence scenarios:
Data Modeler or Astrologist? Understanding Star vs Snowflake schema
If you find yourself surrounded by experienced data modelers, you’ll probably hear them talking about stars and snowflakes. Don’t worry, they haven’t become astrologists or nature researchers all of a sudden. Star and snowflake schema are probably the most influential concepts in the world of dimensional modeling.
Let’s first try to explain the star schema. Don’t forget, according to Kimball, each piece of data should be classified either as “what, when, where, who, why”, or as “how much”, or “how many”. So, in a well-designed dimensional model, you should have a central table containing all the measurements and events – this is a fact table – surrounded by lookup tables, which we refer to as dimensions. The fact table and dimension tables are connected via relationships established between the primary key from the dimension table and the foreign key from the fact table. If you take a look at the following illustration, you’ll understand why this data modeling design pattern is called star schema.
Star schema is a de-facto standard for modeling data in enterprise data warehousing systems, because it enables efficient querying of huge amounts of data, fast aggregations and intuitive data filtering.
Although there are many ongoing debates questioning the star schema relevance for modern data platform solutions because of its “age”, it is fair to say that this concept is still absolutely relevant and definitely most widely adopted when it comes to designing efficient and scalable business intelligence systems.
Snowflake schema is very similar to star schema! Conceptually, there is no difference between the star and snowflake – in both cases, you’ll place your “who, what, when, where and why” into dimension tables, while keeping your “how much” and “how many” in the fact table. The only difference is that in the snowflake schema, dimensions are normalized and broken down into sub-dimensions, as you may see in the following illustration, which is why it resembles a snowflake.
The main motivation for normalizing dimensions is to remove the data redundancy from the dimension table. Although this might sound like a desirable approach, normalizing dimensions comes with some serious considerations:
Of course, there are specific use cases where dimension normalization may be a more viable choice, especially in regard to reducing the data model size. However, keep in mind that snowflake schema should be an exception rather than a rule when modeling your data for business intelligence workloads.
领英推荐
Star schema should always be a default choice!
To wrap up this topic on the Star schema relevance, I like to use an analogy to a marketing message created by Carlsberg Brewery, which says:
Translated to data modeling world:
The Star schema is PROBABLY the best data model in the business intelligence world…
3 Things to Know About Fact Tables
There are dozen of wonderful books (like the aforementioned “The Data Warehouse Toolkit”, then “The Star schema – Complete Reference” by Christopher Adamson ), which I strongly encourage you to read and learn all the details about dimensional modeling and the star schema in particular.
Therefore, thinking that reading this, or any other single article, will make you a “master of dimensional modeling” would be absolutely wrong and pretentious. However, I would like to emphasize 3 key concepts relevant to fact tables in the dimensional model.
Disclaimer: these are 3 concepts that I personally find most important and that doesn’t necessarily mean that you should not consider some other concepts more relevant.
3 Things to Know About Dimension Tables
As I mentioned, these are just a few concepts relevant to dimension tables, and I strongly advise reading Kimball’s or Adamson’s book to understand other important concepts and techniques.
Conclusion
By learning about various data model types (conceptual, logical, and physical), we’ve built a solid foundation for understanding the importance of the data modeling process in general. Dimensional modeling, though only a subset of data modeling, is one of the concepts that are of key importance in implementing efficient and scalable real-life business intelligence solutions.
To conclude, the biggest advantage of dimensional models is their flexibility and adaptability. Here is the list of the changes that can be applied without violating any existing business intelligence query or application:
Thanks for reading!
Consultant database, data warehouse, BI, data mart, cube, ETL, SQL, analysis, design, development, documentation, test, management, SQL Server, Access, ADP+, Kimball practitioner. JOIN people ON data.
1 年Inferred members and their naming is often confused, I seen Unknown used for Null instead of Missing or Nothing.
Cultivating Analytics and Business Intelligence to evolve the organization.
1 年Excellent article - always useful to emphasize fundamental principles, like Kimball Dimensional Modeling.
Power BI Expert
1 年Great read, thanks for posting.
Economista| Finanzas | Business Intelligence | Data Analyst | Data Engineer | Python | SQL | Azure | PowerBI
1 年Thanks for sharing Nikola Ilic
Next Trend Realty LLC./wwwHar.com/Chester-Swanson/agent_cbswan
1 年Thanks for Sharing.