登录查看更多内容

How BI's data module is different from normal SQL and How I re-design the former developer's data module

黄昊超

资深商业智慧(BI)和ETL开发工程师，牛津大学硕士

发布日期: 2024年8月6日

In June I received a user requirement which was to basically replicate the brand sales manager’s PPT that was presented to the board. It is dedicated to replicating the sales PPT in the Group Commercial App which has already got most of the data. The fundamental of the Group Commercial App is mainly to have the data of actual sales, budget, forecast based on week. It also includes historical and future data.

The Group Commercial App was created by someone else and it is now solely under my maintenance. When I opened the Table Viewer to view the data module, I was surprised that it has multiple starts schema in the data module. It was interesting to see the former developer was good at SQL design, but they didn’t fully understand how the Qlik BI’s data module works.

The fundamental most of the modern BI tools are designed to provide in-memory calculation/display of the self-server interactive dashboard to users. The in-memory calculation somehow provides a solution to different user requirements if a user wants to see the data aggregation that is grouped by different granularity levels, E.g. the user wants to see the actual sales by week, or to see the future forecast by quarter. These all can be done by non-technical user self-serve by clicking a few buttons rather than re-coding select group by where clause.

I can see that the script of the initial design (the SNAP table and surrounding linked tables) was good and a normal start schema. Then when more user requirements were raised, the former subsequent developer just added more and more start schemas (SNAP4 table, SNAP5_FINAL table, DATA4 table with weeks, quarter in each table), i.e. more and more group by where clause.

If my current user requirement is just a tiny tweak to the app, I would just follow the same principle as it is to save time and effort. I.e. to avoid doing regression testing. However, this time the user requirement is not tiny, and it is aimed at being scalable and expandable. I have to decide to re-design the data module.

In BI, unless it is for a large volume of data, there is no need to pre-group by the transactional data by week or quarter. Because by connecting the fact/transactional data to dimensional data (calendar table), the calendar table can do the group by (week, month, quarter, year) job for you.

If it is for a large volume of data, then yes it is very necessary to group by the fact/transactional data in advance by the targeted dimensional data (i.e. week). This will save plenty of reload time and app opening time. Of course by doing so, this will lose part of the detailed data granularity (i.e. transactional data group by week, you will lose the data of every date). If we want to show very detailed granularity of data like daily for large volumes of data. We may have to shorten the length/time period of the daily data we want to store and display as data segregation.

So it depends on the specific target requirement. Here I re-design the data module. The source data is based on week and year, item_code (SKU), group_customer_code (for telling which internal site and external customer). The DATA table is the fact-transactional table, the rest are the dimensional tables. The RJ_BRAND table is the new table I added, this table helps link the combinations of SKU, internal site and external customer that belong to the Brand Manager RJ, we can then easily filter out RJ’s transactional data. Because RJ’s table has item_code (SKU), group_customer_code, so it can’t just append to PRODUCT table or CUSTOMER table without combining those two tables. In summary, the new design is scalable, if we have more Brand Managers’ data to add to, we can just follow the same principle of creating the RJ_BRAND table. This won’t significantly increase the burden and structure of the existing app.

领英推荐

What’s the Difference Between DAX and Power Query or M?

Enterprise DNA 11 个月前

What Are the Steps to Cast INT in SQL for Type…

StrataScratch 7 个月前

Data model "Brain": Why Power BI is More Than Just a…

Namasys Analytics 2 年前

The new design also has improved the reload very well, even though the size increased from 13.5MB to 19.2MB, the reload time dropped from nearly 5 minutes to just 20 seconds.

Before redesigning the data module:

After redesigning the data module:

要查看或添加评论，请登录

黄昊超的更多文章

Manage the Double Qlik CALs via the QlikView Governance Dashboard

2024年11月24日

Manage the Double Qlik CALs via the QlikView Governance Dashboard

There are two types of CALs in QlikView - Document Cal and Named User Cal. One QV App's Document Cal is assigned to one…
Create a Universal Folder Path that Works Across Servers in Windows Server

2024年11月24日

Create a Universal Folder Path that Works Across Servers in Windows Server

The Dev Server and Production Server need to access a file in a folder path that is located in another New Server. 1…
Understand the Sage X3 Help Centre - How to Use the Table Dictionary

2024年11月19日

Understand the Sage X3 Help Centre - How to Use the Table Dictionary

There are many ERP and CRM systems on the market worldwide. Almost every major country has produced their own ERP/CRM…
BI Development Process

2024年11月15日

BI Development Process

I listed the details of the BI Development Process based on my experience. It gives direction and reduces errors to…
Summary of Approaches to Handling Large Volumes of Data for Data Lakehouse

2024年11月14日

Summary of Approaches to Handling Large Volumes of Data for Data Lakehouse

1, Reduce the data granularity by aggregating the data This approach could be aggregating (sum) the sales amount by…
Create a Full Calendar based on exisiting Fiscal Year, Month, and Month Start Date in Qlik

2024年11月7日

Create a Full Calendar based on exisiting Fiscal Year, Month, and Month Start Date in Qlik

My organisation is migrating from Sage 500 to Sage X3. I will need to build a new full calendar in Qlik based on the…
Handling unexpected employee behaviours of clock-in/out multiple times per date

2024年10月3日

Handling unexpected employee behaviours of clock-in/out multiple times per date

In another case in the HR Dashboard, HR reported a bug/error that a few employees’ weekly working hours calculations…
I was nominated for a Success Award and the BI development story behind the scenes - trying to solve a larger problem

2024年8月28日

I was nominated for a Success Award and the BI development story behind the scenes - trying to solve a larger problem

I was nominated for a Success Award because I eventually and successfully built a dashboard for replicating a monthly…
A Trial and Error of an Inaccurate Enough Tabular Data Classification with AutoML

2024年8月6日

A Trial and Error of an Inaccurate Enough Tabular Data Classification with AutoML

I built up a completed Data Warehouse for the HR Department for Business Intelligence purposes at first. Based on this…
A Qlik Script to Load All Flat Files in a Folder and its Sub-Folders

2024年6月22日

A Qlik Script to Load All Flat Files in a Folder and its Sub-Folders

In Qlik, using a wildcard symbol "*" allows the script to load all flat files within a folder. But this requires an…

See all articles

How BI's data module is different from normal SQL and How I re-design the former developer's data module

黄昊超

资深商业智慧(BI)和ETL开发工程师，牛津大学硕士

领英推荐

黄昊超的更多文章

社区洞察

其他会员也浏览了

Power BI: Bridge the gap between conventional and modern business reporting systems

Best Practices for Using Power Query in Power BI to Clean and Transform Data

3 reasons to use views instead of tables in Power BI!

Handling Large Datasets in Power BI: Import Mode vs Direct Query

Best Practices in SQL for a Power BI Developer

DAX vs Power Query (M- Language) || Power BI || Belayet Hossain

How To: Tableau Writeback to Microsoft SQL-Server without using extensions or APIs

Multi-valued attributes in a Tableau BI solution

Import vs Direct Query in PowerBI.

领英推荐

黄昊超的更多文章

Manage the Double Qlik CALs via the QlikView Governance Dashboard

Create a Universal Folder Path that Works Across Servers in Windows Server

Understand the Sage X3 Help Centre - How to Use the Table Dictionary

BI Development Process

Summary of Approaches to Handling Large Volumes of Data for Data Lakehouse

Create a Full Calendar based on exisiting Fiscal Year, Month, and Month Start Date in Qlik

Handling unexpected employee behaviours of clock-in/out multiple times per date

I was nominated for a Success Award and the BI development story behind the scenes - trying to solve a larger problem

A Trial and Error of an Inaccurate Enough Tabular Data Classification with AutoML

A Qlik Script to Load All Flat Files in a Folder and its Sub-Folders

社区洞察

其他会员也浏览了

Power BI: Bridge the gap between conventional and modern business reporting systems

Best Practices for Using Power Query in Power BI to Clean and Transform Data

3 reasons to use views instead of tables in Power BI!

Handling Large Datasets in Power BI: Import Mode vs Direct Query

Best Practices in SQL for a Power BI Developer

DAX vs Power Query (M- Language) || Power BI || Belayet Hossain

How To: Tableau Writeback to Microsoft SQL-Server without using extensions or APIs

Multi-valued attributes in a Tableau BI solution

Import vs Direct Query in PowerBI.