登录查看更多内容

From Normalization to Denormalization in Power BI. Star Schema.

Rafael Vera-Mara?ón

Data Engineer @ Minsait

发布日期: 2024年9月16日

Just check the Power BI report here

One of the most important aspects of my development has been transitioning from working with normalized data schemas to denormalized schemas, especially in the context of tools like Power BI. During this master module week, data exploiting processes and Power BI, I had the opportunity to dive deeper into this process and better understand the value of both approaches depending on business needs and data analysis.

The Normalized Schema

When I started working with the data, I had a typical relational model: tables organized in a highly detailed and normalized way. Each entity, such as customers, products, employees, sales, and categories, had its own table to avoid redundancy and ensure data integrity. This approach is ideal when the primary goal is to maintain strict control over the data and ensure accuracy. However, as I delved into the report's requirements, I realized that for faster and more efficient analysis in Power BI, I needed to go further.

The Shift Towards Denormalization

As the project progressed, I realized that maintaining all tables in a normalized format was not practical for creating a Power BI report that required agile analysis and quick queries. This is where denormalization comes into play.

Instead of working with a highly normalized schema, I evolved the model towards a star schema, a type of denormalized schema widely used in data analysis. This type of schema has a central fact table (which contains transactional data, such as sales) surrounded by dimension tables that provide additional context (such as customers, products, or employees).

The Star Schema

In the project, the central fact table (`Sales`) became the heart of the model, while the dimension tables (`Products`, Customers, Geography, etc.) were directly connected to this table. The star schema allows for more efficient queries and aggregations because the relationships between the tables are simple and direct. This model also enhances query speed and performance, as most aggregation operations can be performed on the fact table without needing to run multiple complex joins between tables.

Advantages of the Star Schema for Data Analysis

Using the star schema allowed me to optimize the Power BI report in several ways:

1. Simplified Queries: With a central fact table and several dimensions, I was able to query large volumes of data more efficiently, as the relationships were simple and clear.

2. Improved Performance: Denormalization reduced the need for complex table joins, significantly improving response times in the Power BI reports.

3. Ease of Use for Multidimensional Analysis: This schema is ideal for multidimensional analysis, which allowed me to apply techniques like clustering and drillthrough, facilitating the extraction of deeper insights in the analysis of sales and customer behavior.

Analytics Insight? 3 个月前

Unlocking the Power of Business Intelligence (BI) for…

InbuiltData 1 年前

Mastering Power BI Performance: A Comprehensive Guide…

Rajat Jain 1 个月前

Disclaimer:

While a Data Engineer is expected to master a variety of tools, dashboard color schemes and visualization design are not my strongest areas. However, the focus of this project wasn't on Power BI itself, but rather on the processes of data exploitation and denormalization, using a user-friendly "transformation" to present analytics effectively.

The Importance of Denormalization

The main benefit I discovered with denormalization and the use of the star schema is its ability to simplify analysis. By combining multiple tables and reducing the need to constantly join data in real-time, Power BI can work with large volumes of data more efficiently. This is especially important when working with OLAP cubes and multidimensional models.

OLAP Cubes

In this context, OLAP cubes play a fundamental role in optimizing performance and delivering efficient analysis. OLAP cubes allow for the analysis of large volumes of data in a multidimensional manner, which makes them ideal for projects like mine, where it was crucial to analyze key metrics such as sales and customer satisfaction across different dimensions (time, geography, product categories, etc.).

As a Data Engineer, understanding how to properly build an OLAP cube is essential. Not only does it facilitate real-time data analysis, but it also allows for the segmentation and aggregation of information more efficiently, reducing load times and improving the user experience.

image source

My Evolution in This Project

This project was a practical reminder of how important it is to be flexible and adapt the data schema according to the analysis needs. While starting with a normalized model ensures integrity and accuracy, denormalization and the use of the star schema, when done correctly, open the door to faster and more computationally efficient analysis.

Throughout the development of the project, I applied several advanced techniques in Power BI, such as drillthrough, custom tooltips, clustering, and interactive navigation between pages. However, the most important step was the shift in mindset towards a denormalized model, which allowed the integration of all these functionalities without compromising performance.

Just check the Power BI report here

A special thank you to my master's professor, Daniel Burrueco , for his mentorship and clear explanations throughout the course. His guidance made the learning process much smoother and more insightful.

要查看或添加评论，请登录

查看全部

From Normalization to Denormalization in Power BI. Star Schema.

Rafael Vera-Mara?ón

Data Engineer @ Minsait

The Normalized Schema

The Shift Towards Denormalization

The Star Schema

Advantages of the Star Schema for Data Analysis

领英推荐

Disclaimer:

The Importance of Denormalization

OLAP Cubes

My Evolution in This Project

更多精彩文章

社区洞察

其他会员也浏览了

Power BI: Best Practices for Report Optimization

Harnessing the Power of Power BI: An In-Depth Guide

The Ultimate Guide to Microsoft Power BI Best Practices

Business Intelligence (BI)

BI + Obi-Wan = A New Hope 2.0?

Leveraging BI for Competitive Advantage

What is the difference between a Power BI report and a dashboard?

Power BI Best Practices: From Data Modeling to Dashboard Design

An assignment of intermediate level, introduction to data analytics, Power Bi in field of Data analytics by TDI

"Data Analysis Expressions (DAX) in Power BI : Powerful Different Formulas and Functions"

The Normalized Schema

The Shift Towards Denormalization

The Star Schema

Advantages of the Star Schema for Data Analysis

领英推荐

Disclaimer:

The Importance of Denormalization

OLAP Cubes

My Evolution in This Project

Despliegue de código con GIT, Docker, CI/CD-Jenkins, Sonarqube

2024年11月4日

Applying Machine Learning to SWRO Desalination data with Apache Spark on Databricks

2024年9月25日

Proyecto ETL de Comercio Exterior con DLT, DBT, DuckDB y AWS

2024年8月26日

Optimización de la Replicación de Datos en Tiempo Real con Airbyte y PostgreSQL

2024年7月24日

Building an IoT Monitoring System with Spark Structured Streaming, Kafka and Scala

2024年7月18日

Benchmark de rendimiento entre Parquet, Delta Lake, ORC, AVRO

2024年7月11日

Exploring Spark and Airflow Integration for Submitting Python and Scala Jobs

2024年6月10日

Building a simple ETL Pipeline in PySpark and S3 persistence: A SOLID Approach

2024年6月7日

Deploying FastAPI Microservices on AWS Elastic Kubernetes Service (EKS) using Terraform

2024年5月10日

Automatizando Posología de medicamentos. Workaround a la inmadurez digital del Servicio Sanitario.

2024年4月25日

社区洞察

其他会员也浏览了

Power BI: Best Practices for Report Optimization

Harnessing the Power of Power BI: An In-Depth Guide

The Ultimate Guide to Microsoft Power BI Best Practices

Business Intelligence (BI)

BI + Obi-Wan = A New Hope 2.0?

Leveraging BI for Competitive Advantage

What is the difference between a Power BI report and a dashboard?

Power BI Best Practices: From Data Modeling to Dashboard Design

An assignment of intermediate level, introduction to data analytics, Power Bi in field of Data analytics by TDI

"Data Analysis Expressions (DAX) in Power BI : Powerful Different Formulas and Functions"