Dimensional Modeling - Part 1 of 2: An Overview

Dimensional Modeling - Part 1 of 2: An Overview

I believe that the data model is the most important part of any data driven solution. Because I work in the Business Intelligence and Data Warehousing field, the dimensional data model is the standard for how to relate and store data. In this two part series, I will explain the concept of dimensional modeling, compare dimensional modeling with transactional modeling, and provide tips for how best to create a dimensional model. For part one, I will begin with the overview and comparison:

Introduction to Dimensional Modeling (OLAP)

The premise of data warehousing is that data is stored in a dimensional model, which allows for complex, online analytical processing (OLAP) queries to be generated and returned in a reasonable amount of time. To achieve this, data is stored in a denormalized manner which stores redundant data, and thus reducing the number of joins a relational database management system (RDBMS) would need to perform. Denormalized, dimensional models are made up of dimensions and facts and the resulting model is often referred to as a star model. This is because the fact is central to the dimensions and resembles a star (figure 1).

No alt text provided for this image

Figure 1: Sample Sales Dimensional (Star) Model

Dimensions typically contain attributes or textual information that describes entities. Examples could be customer name, address, employee name, or product name and description. Facts contain metrics or items that need to be calculated or summed. Common examples are sales order quantity, sales order amount, or discount amount.

Comparison with Transactional Modeling (OLTP)

Dimensional models differ from online transactional processing (OLTP) systems in that transactional models are highly normalized. Normalized data models try to remove redundant data stored within tables, and thus necessitates more joins. The purpose of transactional databases is to quickly store and retrieve a small number of records (figure 2).  

No alt text provided for this image

Figure 2: Sample Sales Transactional Model

A sample transactional request could be: 

Return a listing of products for open orders placed by John Smith

This query is fairly simple for the database to return because you are looking for a small number of records. Only the products pertaining to John Smith will be retrieved. Even if you have millions of records, the database can quickly locate, and return those specific records.

A sample analytical request could be: 

Retrieve East Region revenue for products sold to customers living in Washington DC

In a transactional environment, this is a much more difficult question to answer. Looking at the sample transactional model (figure 2), you would need to join at least 10 tables together, and there may be thousands, if not millions of orders for a particular location. With the sample dimensional model (figure 1), you only need four tables to answer the same question. You can easily add or remove dimensions to your request and the results would still come back in a short amount of time; seconds not minutes. 

Benefits of Dimensional Modeling

The reason this works so well is that many tables from the transactional system are combined into a singular dimension. Doing this creates redundant data, but allows for much faster retrieval of large volumes of data. All metrics are stored in the fact and can easily be summed or aggregated to the desired level by bringing in or removing attributes from the associated dimensions. Another reason it works well is because executing a complex, analytical query in the data warehouse reduces the burden on the transactional system and allows it to handle only transactional requests.

Think of your favorite online retailer. When you are placing an order, viewing order history, or updating your customer information, you are storing and retrieving data from a transactional data model. Business intelligence, on the other hand, allows business users to use their data to create trends, find correlations, and make informed decisions. To do this, data must be able to be summarized, retrieved quickly, and have the ability to drill into the details. This is where a robust dimensional model comes into play.  

In the next post, I will discuss some best practice tips to consider when creating a dimensional model. Stay tuned…

The views expressed in this post are entirely my own, and don't represent that of my employer or any of its employees.

Danielle C.

Technology Delivery Leader at Capital One

9 年

Great post!

回复
Paul Needleman

The Power of Connections | Data | People | Snowflake

9 年

Thomas Nourse Thank you. If you have any suggestions for best practices for dimensional modeling, let me know. Lauri Putt Needleman Dimensional models scale well with new products, which would just be additional data within the same model. Even new attributes or metrics are easily added. New lines of business may require a new data model to be created, but that depends on what level of data is needed to be reported on. Dimensional models have a concept of "shared" or conformed dimensions that allow different business areas to be related through common attributes. For example, a product dimension could be used by sales, finance, and marketing. Each may require different dimensional data models but the data can be related through the product.

回复

Been preaching this stuff for years too! Good first part Paul!

回复
Lauri Putt Needleman

Chief People Officer | Empowering People in service of Small Business

9 年

Can you change a data model as your organization rolls out new products or lines of business?

回复
Jonathan Pan, Ph.D, MBA

Vice President | Strategy | Business Development | General Manager

9 年

Great post,Paul!

回复

要查看或添加评论,请登录

Paul Needleman的更多文章

  • The (Modern) Big Data Platform

    The (Modern) Big Data Platform

    Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or…

  • Empowering the Self-Service Analytics Workforce

    Empowering the Self-Service Analytics Workforce

    Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or…

    3 条评论
  • Data Management in the Era of Big Data

    Data Management in the Era of Big Data

    Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or…

    4 条评论
  • 3 Tips for Achieving Success at Work

    3 Tips for Achieving Success at Work

    Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or…

    1 条评论
  • Data Warehousing in the era of Big Data

    Data Warehousing in the era of Big Data

    Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or…

    26 条评论
  • Dimensional Modeling - Part 2 of 2: Creating a Successful Model

    Dimensional Modeling - Part 2 of 2: Creating a Successful Model

    In my previous post, I introduced dimensional modeling, compared with transactional modeling, and provided some overall…

    5 条评论

社区洞察

其他会员也浏览了