登录查看更多内容

Star Schema in Power BI: The Key to Simplicity and Speed

Boris Kushnarev

Business Analyst at Optus | Data Analytics Consultant at The Data School | Neo4j Certified Professional | Tableau Data Analyst Certification | Alteryx Advanced Certified | [email protected]

发布日期: 2024年11月3日

In the world of data analytics, effective data modeling is the foundation for insightful and efficient reporting, and Power BI is a powerful tool that thrives on well-structured data. One of the essential aspects of developing robust data models is understanding and selecting the right schema. Whether it's the straightforward star schema, the intricate snowflake schema, or more complex structures like the galaxy schema, each has its strengths and trade-offs that impact performance and usability. Star schemas are preferred in Power BI for their simplicity and performance benefits, while snowflake schemas may be used when normalization is necessary or when dimension tables are too large to be denormalized efficiently. In this article, we will focus on the star schema and explore why it is often the go-to choice for Power BI data modeling.

In a star schema, two key components are essential: fact tables and dimension tables. The fact table serves as the central repository for quantitative data, containing metrics and measurements that analysts want to analyze, such as sales amounts, quantities sold, and profit margins. Each record in the fact table is typically associated with foreign keys that link to corresponding dimension tables, which provide descriptive attributes related to the facts. Dimension tables, on the other hand, contain the contextual information that helps explain the facts. They include data like product names, customer details, time periods, and geographical locations. To transform our non-normalized table into a star schema, we need to normalize the data by creating a dedicated fact table that aggregates the quantitative metrics, along with separate dimension tables that encapsulate the related attributes. This restructuring will not only improve data organization but also enhance the efficiency of our queries in Power BI. Additionally, the relationships between the fact table and the dimension tables will exhibit a one-to-many cardinality, meaning each record in the dimension table can relate to multiple records in the fact table, thereby facilitating more efficient data analysis.

For simplicity, we will consider a well-known superstore dataset (link) that is widely used in the data analyst community. This dataset is initially presented in a table that is in a non-normalized form. Throughout this article, we will guide you through the normalization process step-by-step, ultimately transforming the dataset into a star schema. This structured approach will not only enhance data integrity but also optimize performance in Power BI, enabling more efficient analysis and reporting. Join us as we delve into the details of creating a robust star schema from this familiar dataset.

In the model view, we can see that the original table has 21 columns:

To transform this non-normalized data into a normalized format, we first need to identify the dimension tables. Our first dimension table will be the Product table, which we will name dimProduct. This table will include the following columns: Product ID, Category, Sub-Category, and Product Name. The Product ID column contains unique values for each product and will serve as the primary key for this table. To create dimProduct, we will duplicate the initial table and rename it. Then, we will retain only the columns required for the Product table to ensure it only contains the relevant product-related attributes. Finally, we need to remove duplicates from dimProduct to maintain unique records for each product.

Similarly, we create a dimCustomer table with Customer ID as the primary key, along with the columns Customer Name and Segment. Following the same approach, we create a dimRegion table with Postal Code as the primary key and include City, State, Region, and Country.

We also have two date columns: Order Date and Ship Date. For these, we will create two separate dimension tables—dimOrderDate and dimShipDate—with Order Date and Ship Date as their respective primary keys. Initially, these tables will have only one column each. However, it is often useful to extract additional information from these date columns, such as the day of the week, quarter, or other relevant details for use in visualizations. If you do not need this additional date information, you can choose to keep the Order Date and Ship Date columns in the fact table as they are. Additionally, it is important to remove duplicates from dimOrderDate and dimShipDate to ensure each date appears only once. Below is an example of the dimOrderDate table:

领英推荐

A Comprehensive Guide to Self-Service BI in 2025

Xavor Corporation 1 个月前

What is the best way to optimize your data model in…

Enterprise DNA 1 年前

Advanced Features in Data Analytics Tools You Might Be…

Quantum Analytics NG 6 个月前

Our final dimension table will be dimShipMode, which contains data in a single column named Ship Mode. As with our previous dimension tables, we want to ensure there are no duplicates, and we will have only four unique records in this table. However, since we do not have a natural primary key here, we can create one by clicking on the "Add Column" option and selecting "Index Column" from the Ribbon. This will allow us to generate a unique identifier for each record. As a result, we will have:

Finally, we will create our fact table and name it factTransaction. You can simply use your original table for this purpose; just rename it to factTransaction. Next, remove all columns that are associated with the dimension tables, retaining only the foreign keys. The corresponding primary keys are defined in the dimension tables. Note that the Order Date and Ship Date columns will become foreign keys and must be kept in the fact table. To obtain the Ship Mode Key from the dimShipMode table, we need to merge our fact table with the dimShipMode table using the Ship Mode column, and then drop the Ship Mode column from the fact table. As a result, our fact table will consist solely of foreign keys and the quantitative transactional data:

Finally, we can observe the resulting data structure in the model view, which resembles a star schema:

In conclusion, transforming our non-normalized data into a star schema significantly enhances both data organization and query performance in Power BI. By clearly defining our fact and dimension tables, we create a structure that allows for efficient data analysis and reporting. The normalization process ensures that each dimension table provides unique contextual information while the fact table aggregates quantitative metrics, establishing a clear one-to-many relationship between them. This structured approach not only simplifies data retrieval but also empowers analysts to derive valuable insights, ultimately leading to more informed decision-making.

要查看或添加评论，请登录

Boris Kushnarev的更多文章

Implementing Row-Level Security in Power BI: Ensuring Data Privacy and Access Control

2024年10月27日

Implementing Row-Level Security in Power BI: Ensuring Data Privacy and Access Control

In my recent Power BI project, I tackled a common but crucial challenge—establishing secure access to specific data…

3 条评论
Algorithmic Trading and Strategies.

2024年6月24日

Algorithmic Trading and Strategies.

Algorithmic trading, or algotrading, has gained popularity for several reasons. First, the high-level programming…
Algorithmic implementation of the VWAP Trading Strategy

2023年8月28日

Algorithmic implementation of the VWAP Trading Strategy

I recently had the pleasure of meeting Talgat Baibussinov, a seasoned professional in the fields of trading and…
Recommendation systems

2020年2月10日

Recommendation systems

Recommendation systems play an important role in e-commerce, banking, and other businesses nowadays. Based on the…
PCA results interpretation

2019年3月1日

PCA results interpretation

Principal component analysis or PCA for short is the useful method for reducing the dimensionality of the considered…
How to choose hyperparameters for a model with GridSearchCV.

2019年2月4日

How to choose hyperparameters for a model with GridSearchCV.

It has been always a challenge to find hyperparameters which are just right for your model. Using different method, for…

3 条评论
Flower classifier with Convolutional Neural Networks using PyTorch

2019年1月24日

Flower classifier with Convolutional Neural Networks using PyTorch

Nowadays, Convolutional Neural Networks play an importent role in solving different questions in computer vision…

See all articles

Star Schema in Power BI: The Key to Simplicity and Speed

Boris Kushnarev

Business Analyst at Optus | Data Analytics Consultant at The Data School | Neo4j Certified Professional | Tableau Data Analyst Certification | Alteryx Advanced Certified | [email protected]

领英推荐

Boris Kushnarev的更多文章

社区洞察

其他会员也浏览了

How to Automate Data Transformation Using Power Query in Excel and Power BI

The Future of Data Analytics: How Power BI is Shaping Business Intelligence

Extracting Tableau Metadata using GraphiQL

From Data to Decisions: Power BI for Powerful Analytics

Transforming Raw Data into actionable visualized reports Using Power BI

7 Reasons Aqua Data Studio is the One Tool for Self-Service Analytics

5 Key Features of Tableau for Powerful Data Analytics ????

DAX (Data Analysis Expressions) Basics

Unlock the Power of Data: Best Practices for Power BI Data Modeling in 2024

How to Use DAX Queries in Power BI? Data Analytics Guide 101

领英推荐

Boris Kushnarev的更多文章

Implementing Row-Level Security in Power BI: Ensuring Data Privacy and Access Control

Algorithmic Trading and Strategies.

Algorithmic implementation of the VWAP Trading Strategy

Recommendation systems

PCA results interpretation

How to choose hyperparameters for a model with GridSearchCV.

Flower classifier with Convolutional Neural Networks using PyTorch

社区洞察

其他会员也浏览了

How to Automate Data Transformation Using Power Query in Excel and Power BI

The Future of Data Analytics: How Power BI is Shaping Business Intelligence

Extracting Tableau Metadata using GraphiQL

From Data to Decisions: Power BI for Powerful Analytics

Transforming Raw Data into actionable visualized reports Using Power BI

7 Reasons Aqua Data Studio is the One Tool for Self-Service Analytics

5 Key Features of Tableau for Powerful Data Analytics ????

DAX (Data Analysis Expressions) Basics

Unlock the Power of Data: Best Practices for Power BI Data Modeling in 2024

How to Use DAX Queries in Power BI? Data Analytics Guide 101