From Normalization to Denormalization in Power BI. Star Schema.
Just check the Power BI report here
One of the most important aspects of my development has been transitioning from working with normalized data schemas to denormalized schemas, especially in the context of tools like Power BI. During this master module week, data exploiting processes and Power BI, I had the opportunity to dive deeper into this process and better understand the value of both approaches depending on business needs and data analysis.
The Normalized Schema
When I started working with the data, I had a typical relational model: tables organized in a highly detailed and normalized way. Each entity, such as customers, products, employees, sales, and categories, had its own table to avoid redundancy and ensure data integrity. This approach is ideal when the primary goal is to maintain strict control over the data and ensure accuracy. However, as I delved into the report's requirements, I realized that for faster and more efficient analysis in Power BI, I needed to go further.
The Shift Towards Denormalization
As the project progressed, I realized that maintaining all tables in a normalized format was not practical for creating a Power BI report that required agile analysis and quick queries. This is where denormalization comes into play.
Instead of working with a highly normalized schema, I evolved the model towards a star schema, a type of denormalized schema widely used in data analysis. This type of schema has a central fact table (which contains transactional data, such as sales) surrounded by dimension tables that provide additional context (such as customers, products, or employees).
The Star Schema
In the project, the central fact table (`Sales`) became the heart of the model, while the dimension tables (`Products`, Customers, Geography, etc.) were directly connected to this table. The star schema allows for more efficient queries and aggregations because the relationships between the tables are simple and direct. This model also enhances query speed and performance, as most aggregation operations can be performed on the fact table without needing to run multiple complex joins between tables.
Advantages of the Star Schema for Data Analysis
Using the star schema allowed me to optimize the Power BI report in several ways:
1. Simplified Queries: With a central fact table and several dimensions, I was able to query large volumes of data more efficiently, as the relationships were simple and clear.
2. Improved Performance: Denormalization reduced the need for complex table joins, significantly improving response times in the Power BI reports.
3. Ease of Use for Multidimensional Analysis: This schema is ideal for multidimensional analysis, which allowed me to apply techniques like clustering and drillthrough, facilitating the extraction of deeper insights in the analysis of sales and customer behavior.
领英推荐
Disclaimer:
While a Data Engineer is expected to master a variety of tools, dashboard color schemes and visualization design are not my strongest areas. However, the focus of this project wasn't on Power BI itself, but rather on the processes of data exploitation and denormalization, using a user-friendly "transformation" to present analytics effectively.
The Importance of Denormalization
The main benefit I discovered with denormalization and the use of the star schema is its ability to simplify analysis. By combining multiple tables and reducing the need to constantly join data in real-time, Power BI can work with large volumes of data more efficiently. This is especially important when working with OLAP cubes and multidimensional models.
OLAP Cubes
In this context, OLAP cubes play a fundamental role in optimizing performance and delivering efficient analysis. OLAP cubes allow for the analysis of large volumes of data in a multidimensional manner, which makes them ideal for projects like mine, where it was crucial to analyze key metrics such as sales and customer satisfaction across different dimensions (time, geography, product categories, etc.).
As a Data Engineer, understanding how to properly build an OLAP cube is essential. Not only does it facilitate real-time data analysis, but it also allows for the segmentation and aggregation of information more efficiently, reducing load times and improving the user experience.
My Evolution in This Project
This project was a practical reminder of how important it is to be flexible and adapt the data schema according to the analysis needs. While starting with a normalized model ensures integrity and accuracy, denormalization and the use of the star schema, when done correctly, open the door to faster and more computationally efficient analysis.
Throughout the development of the project, I applied several advanced techniques in Power BI, such as drillthrough, custom tooltips, clustering, and interactive navigation between pages. However, the most important step was the shift in mindset towards a denormalized model, which allowed the integration of all these functionalities without compromising performance.
Just check the Power BI report here
A special thank you to my master's professor, Daniel Burrueco , for his mentorship and clear explanations throughout the course. His guidance made the learning process much smoother and more insightful.