Increasing the efficiency of architecture. How to optimize work when the reporting model stops functioning efficiently 
- a case study.

Increasing the efficiency of architecture. How to optimize work when the reporting model stops functioning efficiently - a case study.

Good morning, Micha?. First of all, thank you for accepting our invitation and deciding to participate in our podcast.

Micha? W?odkowski

Good morning, Olga. Thank you very much for the invitation. This is a new experience for me, stepping out of my comfort zone, but let’s see how I manage.

During today's discussion, we want to address a rather important topic at this stage of the development of data management, namely increasing the efficiency of Big Data architecture in organizations. So let’s start with a key question. Is it possible to increase the efficiency of architecture while lowering its costs?

At first glance, it seems contradictory. It’s hard to increase efficiency and lower costs at the same time. However, it’s often possible because corporate architectures aren’t in their optimal state.

If we’re thinking about optimizing something, it implies that this area is currently not in the best condition. Micha?, please tell us why data architectures aren’t in an optimal state today.

There can be many reasons. But most often, it boils down to the fact that everything is constantly changing. These architectures were designed to handle different use cases.

The business world is changing very dynamically. Different companies are being merged. And often, we try to adapt existing tools to use cases for which they weren’t originally designed. Very often, the skills of a given team define which tool portfolio will be used.

Micha?, is it possible to design a good architecture from the start in such a way that it doesn’t need optimization later?

In the long term, it seems that it isn’t. You can’t create one architecture and believe it won’t undergo changes. As I mentioned, needs and available tools change. The only constant is change.

Does that mean it’s impossible to design a good long-term architecture?

I didn’t want it to sound pessimistic or fatalistic. Of course, you can design architecture for many years. However, it should be designed at a more general level and include a process for its change. That means we design knowing it will change and under what conditions. It’s important not to rely on specific vendors and to design flows and assumptions from a big-picture perspective. Here, the role of a data architect is critical, and they should be involved from the very beginning. It’s also very important to properly document these changes and decisions in an ADR (Architecture Decision Record).

Micha?, maybe explain to our listeners what ADR is.

ADR stands for Architecture Decision Record. It’s a record of all significant architectural decisions. It should include a description of the problem, analyzed options, the final decision, and the justification for that decision. It helps document the process, streamline communication, and prevent repeating the same discussions within the team.

It also facilitates onboarding new hires. If business conditions change, for example, a new tool appears or budget constraints arise, it’s easier to revisit these decisions based on the previous analysis. Here, it’s also worth mentioning the risks associated with vendor lock-in.

Interesting observation. Why do you think vendor lock-in is dangerous?

As one of the directors always said, you shouldn’t enter into any solution without a strategy for exiting it. Vendor lock-in limits the organization’s freedom. The company can’t make the best decision for itself because it’s either tied to a long-term contract with a vendor, or it can’t easily switch to other solutions, environments, or clouds.

Returning to the main topic. You mentioned that in most cases, companies are in situations where they can increase the efficiency of architecture while lowering costs. How, then, should one start such a change, in your opinion?

I’d reframe the question a bit because it assumes the organization knows its architecture is suboptimal. It’s better to ask how to recognize that we are dealing with a suboptimal architecture.

I agree. And what then? How do you begin such a change once you know it's needed?

The simplest symptom, though already indicating a suboptimal design, is the long refresh time of a report, for example, 30 minutes or even longer. A very common temptation here, especially with cloud solutions, is to increase the cloud parameters. By pushing a metaphorical button, we add computing power, and it’s a partial solution, I would say temporary, but it comes with many risks. Very quickly, you encounter one of two limitations. The first is a system limitation, and the second is budgetary. In the first case, we can’t do much except redesign the entire model.

As for the second, we might have some options, such as renegotiating terms. But even if we manage to secure additional budget resources, we are mostly just buying ourselves time. And it’s often worth using that time for a rebuild, which brings us back to point one. It’s almost like a systematic problem. That’s not to say you can’t use the maximum parameters offered by the vendor because they were designed for such use. However, if we choose to go that route, we should do so consciously. You can also easily recognize suboptimal solutions by the lack of documentation for the architecture or the report in question. It shouldn't look like we just dump data into one large repository and then connect a visualization or report to it, hoping it will all calculate itself.

Alright, we’ve managed to identify the situation. Now what? How do we change it?

The most important thing is not to do it in isolation, where the technical team tries to solve the issue on its own. It’s essential to involve the users, i.e., the business, and gain their understanding of the changes being made.

That sounds a bit theoretical. Could you give a real-life example or illustrate a specific scenario of how this looks in practice?

Sure. Let’s imagine a data model where we have one large fact table and several dimensions. Let’s say these facts are individual orders, invoices, or receipts. When the entities—whether companies or warehouses—from which the records are gathered are few, no solution seems problematic. Good design becomes crucial when the number of records grows, and we’re no longer talking about thousands or millions of records, but billions or more. In business, it’s essential to see value quickly. So, for a POC (Proof of Concept), we might go for a simple solution, especially to confirm that this type of solution will generate value.

However, even at that stage, it’s worth starting to work on a scalable solution—one that will stand the test of time. In a suboptimal solution, it’s very common to plug the fact table or a single dataset into an engine like an analysis service, and then build Power BI reports on top of that 1-to-1 cube. Initially, these reports are simple, but they become increasingly complex. New fact tables are added, like promotions or deliveries, and the relationships between these tables are stitched together at the lowest level, such as invoice lines or material lines. The refresh of the model and calculations take longer and longer. Some reports can take up to 40 minutes to refresh. Resources during the refresh process are exceeded, and not all results are returned, which leads to errors.

To counter this, parameters are increased. While at first this brings results, and moving from the lowest tier to the highest is justifiable, we eventually reach budgetary or system limits without making any real changes. At the same time, user dissatisfaction grows because reports take too long to generate. Moreover, trying to fit all these elements into one model often leads to reducing the time horizon, which isn’t what users expect. Initially, we might start with data for two years, but then this gets reduced to shorter periods. All this is done in the name of fitting data into one model. Meanwhile, the business expects a magic wand to solve all the issues.

At this point, it’s crucial to return to basics, meaning mapping the core needs of the users. Doing this exercise can be quite enlightening. Unfortunately, it can also lead to encountering many ill-considered responses. That’s why it’s so important to skillfully gather these requirements and prepare for it properly.

What do you mean by preparation? Is it not as simple as calling a workshop and just seeing how it goes?

Unfortunately, it’s not like planning a spur-of-the-moment gathering in Soplicowo or any other Mickiewicz-inspired work, where the gentry hop on their horses and everything works out. Each such workshop should be preceded by a short communication, a briefing. In today's age of virtual meetings, this is usually a teleconference where we confirm the purpose of the meeting and what expectations we have. The team responsible for maintaining reports should also gather the available data beforehand.

Understood. That’s a good approach. What kind of information should the engineers have at their disposal?

Basic information, such as a list of all reports and usage statistics from the last three or six months. If such a list doesn’t exist, it’s essential to conduct an inventory as soon as possible. For reports with high usage, performance analysis should be conducted, especially for those taking 40 minutes to run, to understand which parts of the report are problematic.

It’s surprising that an inventory is needed at this stage. Shouldn’t such a list already exist in organizations?

Unfortunately, in many organizations, not all created reports are inventoried. This is especially the case when business units are allowed to edit and create additional versions or reports, leading to the creation of over 200 or 300 reports in organizations, with many of them being duplicates.

Much of this duplication happens because tasks are repeated, and instead of applying the appropriate filters, people create copies of these reports. They make one, two, three, even ten copies, which complicates their use and later updates.

Some reports are created based on others, where assumptions are changed, but if those changes aren’t documented, you end up with reports presenting different values for the same variable. This is often due to different assumptions or varying data refresh statuses. For example, one data source may have already been refreshed, while another hasn’t been, because it depended on other elements and was more enriched. It’s also important to add information about usage to understand which reports are "dead" and which are still in use, and how often they’re used. Often it turns out that only 10% to 30% of the 200-300 reports are actually being used. This is a common issue in various organizations.

We’ve prepared for the meeting, gathered all the data. What should our next step be?

You need to assess the needs. What basic questions should the report answer? How often should it be refreshed? Is it enough to have a visualization that aggregates the data and shows trends, or are options like exporting data necessary? This needs assessment is crucial because it allows us to select the right tools for the use cases.

You should not be satisfied with answers like, “we’ve always done it this way,” as this can lead to inefficiencies, such as using overly powerful tools for simple tasks or using unsuitable tools for the job. A trivial comparison might be trying to unscrew or tighten a star-shaped screw using a kitchen knife—it takes much longer than using the appropriate Phillips screwdriver. Unfortunately, in business, we sometimes encounter similar situations where we use tools that are not suited to the task. From experience, many reports in certain organizations were used solely for exporting data. And in such cases, a tool like Power BI, which is meant for data visualization, was actually being used for exporting detailed data.

Why is data export from Power BI so common, and why is it possible?

In general, Power BI should allow you to look at data and observe trends—it’s a tool for data visualization. However, if we imagine a table with calculations where we see trends and on the slide we have about 20 values, it’s much easier to export such a table and use it than to manually copy all the data from the screen. I assume that was the primary reason for allowing exports in Power BI. But as we know, necessity is the mother of invention. If we have a model hooked to raw data, what’s to stop us from exporting such data using Power BI, through a table where we prepare a dataset we’d like to export?

Initially, there was no thought given as to whether this would be a one-time thing or something used repeatedly. It was probably information needed urgently, and since it was possible to do it, it was done. Later on, additional use cases emerged, especially since the data was available in one model.

So, am I right in understanding that you’re not a fan of the so-called “one model” approach?

Yes, I’m not a supporter of solutions that are not well-suited to the need they’re supposed to address. Usually, these solutions are both less effective and more expensive. The planning phase in such a scenario is cheap and quick, but usage is inefficient or costly. A well-designed architecture is more expensive during the planning stage, but it ensures more efficient use of the report or reports and data models, and it requires fewer model rebuilds. Most often, during design, we assume the addition of new modules in a way that has been pre-planned. However, if some new data cannot be easily integrated, the best solution is to create a new model. Building dedicated data models for specific use cases allows for offloading part of the calculations to the backend, enables differentiation in data refresh frequencies, and results in faster report generation.

What kind of speed improvements are we talking about for generating report results?

In a general model, a complex report can take up to 40 minutes to run—in a model where we have all the data in one bucket. After optimization, the same report can return results in just a few seconds. It’s important to note that this report has the same metrics as before.

Oh, that’s impressive! What caused such a dramatic speed increase?

First, we used a dedicated model for the report, so we were using the right tool for the job—like using the correct screwdriver instead of a knife. Second, we moved the calculations to the backend, leaving simple aggregations on the front end. Third, we limited the time horizon to three months.

The previous report contained data for over a year. We held discussions with the business to understand which events were most frequently analyzed, and we got the response that it was the last three months. There was always the temptation to include more than three months, sometimes up to a year. We made the decision that for exceptional cases, like promotions or one-off sales events beyond three months, we didn’t need to regularly refresh the entire data model.

We cut the model down to the last three months for the most common use cases. However, for specific cases, since we already had a prepared table on the backend with data from the last three months, we could easily create an ad-hoc report for a dedicated period beyond that time frame. And as an interesting fact, over the course of a year, requests for data beyond three months occurred only three times. Fourth, as part of this speed improvement, we also increased the data model refresh frequency.

That last point seems surprising.

Yes, after separating out the individual tables, we were able to refresh different models at different frequencies to address the needs of specific reports. This way, we didn’t need to unify refresh frequencies into a single standard.

You mentioned gathering requirements. Let’s return to the meeting with the business. What comes next? What should be discussed?

You need to point out the inevitability of change and place it on a timeline. As Peter Drucker once said, a task without a plan—that is, without assigning who, what, and when—is just a good intention. We must set deadlines for identifying the scope of what should move to the new structure and when the migration can begin. It’s crucial to minimize or eliminate the creation of new reports during this period. You also need to define the migration scenarios.

There are basically three main scenarios, though in reality, only one is typically chosen, but it has a broad spectrum of sub-scenarios. Scenario A, also known as the "atomic option," means saying, “Stop the machine; we’re building new.” All efforts are focused on building the new model, and the old one is shut down. While this option is tempting for IT, it’s totally unrealistic from a business perspective and will never be chosen. Scenario B, which I call "the lavish option," involves building a new system alongside the old one. When the new architecture is complete, we shut down the old one.

The alternative title for this scenario is “The end is never in sight.” The end is hard to define because we’re developing two products simultaneously. The temptation to keep adding new developments to the old environment while also working on the new one is too great. This results in maintaining two systems that deliver different results because of varying assumptions in the models, and it’s a very expensive approach. It’s difficult to pinpoint when it will actually end. Finally, we have Scenario C, which is the most commonly chosen one and, in fact, the only truly viable option. In this scenario, we identify critical reports, determine how many of them are crucial, and after migrating those, we shut down the old system and move everything to the new one.

The key factor is how many critical reports we choose. Choosing the right cut-off point is crucial. If the number of reports, let’s call it N, approaches the maximum—all reports—then we essentially have Scenario B, the lavish option. If N approaches zero—almost nothing—we are closer to the atomic option. It’s difficult to say whether 10% or 20% of the reports should be considered critical. This is where we need to meet with the business, understand what is critical, and identify the basic questions that need answering. Generally, the more reports you select, the longer the migration will take.

What other benefits does Scenario C offer?

First of all, it allows us to slim down the architecture by eliminating unused reports or those that don’t provide much business value. This way, we don’t waste resources rewriting unused reports, as we would in Scenario B, the lavish option, and it also gives us the chance to replace inefficient solutions with better ones.

For example, if some reports were used solely for exporting data, we might not need to recreate those reports. Instead, we could propose a different solution, like creating a SQL view to generate queries and directly retrieve the necessary data from the system.

Would you like to add anything else?

Yes, we’ve discussed architectural changes and how they impact OPEX, meaning the costs of running the architecture and the cloud, but these changes can also lead to improved team performance. Employees can focus on generating real value rather than waiting for data or combining partial solutions—taking data from one report, something else from another report, and hoping it all works out. They can either run a query to combine the data or have reports that answer their questions directly. And let’s not forget that sometimes, when Power BI limits data exports to 150,000 rows, necessity being the mother of invention, people split the data either geographically or by time periods and end up spending an entire day exporting data. Instead, they could have written a short SQL query and focused on analyzing the data rather than waiting for it. I believe this is a crucial element in raising technical skills within business and analytics teams.

Thank you, Micha?, for your valuable insights and in-depth analysis of the issues related to the efficiency of Big Data architecture. You’ve highlighted that optimizing data architecture is not a one-time effort but rather a continuous, dynamic adaptation to changing business and technological processes. Your practical examples and tips on working with the business are extremely valuable and show how complex the approach to data management is in modern organizations.

Thank you very much to everyone for listening to this podcast. If you have any questions or feedback, feel free to contact me on LinkedIn or through other platforms.

Thank you for an engaging discussion on the broad topic of data.




要查看或添加评论,请登录

Sofixit的更多文章

社区洞察

其他会员也浏览了