How BI's data module is different from normal SQL and How I re-design the former developer's data module

How BI's data module is different from normal SQL and How I re-design the former developer's data module

In June I received a user requirement which was to basically replicate the brand sales manager’s PPT that was presented to the board. It is dedicated to replicating the sales PPT in the Group Commercial App which has already got most of the data. The fundamental of the Group Commercial App is mainly to have the data of actual sales, budget, forecast based on week. It also includes historical and future data.

The Group Commercial App was created by someone else and it is now solely under my maintenance. When I opened the Table Viewer to view the data module, I was surprised that it has multiple starts schema in the data module. It was interesting to see the former developer was good at SQL design, but they didn’t fully understand how the Qlik BI’s data module works.

The fundamental most of the modern BI tools are designed to provide in-memory calculation/display of the self-server interactive dashboard to users. The in-memory calculation somehow provides a solution to different user requirements if a user wants to see the data aggregation that is grouped by different granularity levels, E.g. the user wants to see the actual sales by week, or to see the future forecast by quarter. These all can be done by non-technical user self-serve by clicking a few buttons rather than re-coding select group by where clause.

I can see that the script of the initial design (the SNAP table and surrounding linked tables) was good and a normal start schema. Then when more user requirements were raised, the former subsequent developer just added more and more start schemas (SNAP4 table, SNAP5_FINAL table, DATA4 table with weeks, quarter in each table), i.e. more and more group by where clause.



If my current user requirement is just a tiny tweak to the app, I would just follow the same principle as it is to save time and effort. I.e. to avoid doing regression testing. However, this time the user requirement is not tiny, and it is aimed at being scalable and expandable. I have to decide to re-design the data module.

In BI, unless it is for a large volume of data, there is no need to pre-group by the transactional data by week or quarter. Because by connecting the fact/transactional data to dimensional data (calendar table), the calendar table can do the group by (week, month, quarter, year) job for you.

If it is for a large volume of data, then yes it is very necessary to group by the fact/transactional data in advance by the targeted dimensional data (i.e. week). This will save plenty of reload time and app opening time. Of course by doing so, this will lose part of the detailed data granularity (i.e. transactional data group by week, you will lose the data of every date). If we want to show very detailed granularity of data like daily for large volumes of data. We may have to shorten the length/time period of the daily data we want to store and display as data segregation.

So it depends on the specific target requirement. Here I re-design the data module. The source data is based on week and year, item_code (SKU), group_customer_code (for telling which internal site and external customer). The DATA table is the fact-transactional table, the rest are the dimensional tables. The RJ_BRAND table is the new table I added, this table helps link the combinations of SKU, internal site and external customer that belong to the Brand Manager RJ, we can then easily filter out RJ’s transactional data. Because RJ’s table has item_code (SKU), group_customer_code, so it can’t just append to PRODUCT table or CUSTOMER table without combining those two tables. In summary, the new design is scalable, if we have more Brand Managers’ data to add to, we can just follow the same principle of creating the RJ_BRAND table. This won’t significantly increase the burden and structure of the existing app.



The new design also has improved the reload very well, even though the size increased from 13.5MB to 19.2MB, the reload time dropped from nearly 5 minutes to just 20 seconds.


Before redesigning the data module:

After redesigning the data module:



要查看或添加评论,请登录

黄昊超的更多文章

社区洞察

其他会员也浏览了