TOP 3 challenges resulting failing with data utilization - and how to fix them
Having worked over 15 years with data analytics and having had multiple sparring discussions with peers in similar roles, I have come to conclusion that there are 3 challenges repeating in every organization, what comes to failing with data utilization.
For some reason not significant innovation or development has happened in this field in bigger scale so far, even these seem to occur on regular basis. The problem is recognized by data anaytics professionals but the discussion remains within that "bubble". The main reason being, in my view is, that the importance of properly developing organization cultures and introducing more roles taking ownership of the data, rather than seeing it as an “IT thing”, is still lacking in C-level. Luckily some forward running organizations do start to have roles like Chief?Data & Analytics Officer (next to CIO, CDO, CFO and not under them), data literacy champion, data product owner etc. (it has been my hobby recently to spot these new roles from recruitment ads) so there is some progress happening. However, I see a high urgency for fast actions, to be able to truly drive the value from data utilization in digital world.?
1- Bad quality data - fighting the fires instead of fixing the fundaments
The data quality issues are usually the most urgent ones to be fixed and mostly talked about. But instead of looking at the data quality holistically, the issues are fixed case by case. (If the land is totally dry, you can not expect blossoming data to grow from it)
There are reports or analysis that use certain data sets, with expectation of them fitting to certain business definitions. The data streaming in to the data set can have wrong values or missing values, and both of those can mess up the results. Suddenly the sales figures drop or increase dramatically. It could be that there are double records for some reason, or records missing.
The quality of the data is set in the business process – where the data is created. To build the data set, that fuels the report or analysis, logic is implemented taking into account f.e. specific types of records only. There can be for example logic to combine certain product codes or only calculate sales of specific products. This logic will break as soon as a new product code is created or the user (or system) enters false code. So if the business process is not taking care of the needed quality (adding new codes without ensuring they are also taken into account everywhere where the data is used, or allowing the user to enter false codes) there will be data quality issues. One of the most common core problem of data quality issues is a change made in the source system without taking into account the impact of other users of that data. Another one is that the user simply does not know the purpose of a certain value and uses the same field in the user interface t input data for something else. Resulting feeding wrong data forward.
In practise what usually happens here is that the (key) user discovers something strange in the report. They contact the Data- or IT support team, who then start to investigate. Eventually they find out there is actually a new type of data in the latest records delivered from the source, or false value in the field. Then they start figuring out why there is such a data and what should be done with it. This process requires time from several people: IT system experts, business process experts, data team, analyst… And finally when they fix the issue, test it and re-run the data loads, a lot of valuable time has been wasted. Usually the ones to blame are also the Data- or IT support team, the ones that had no impact on the creation of the issue but who are seen as "the sole providers of data". Some might feel like heroes, coming to the rescue, but most feel just frustrated, as they understand the magnitude of the challenge.
Common reasons for false values to be entered by users:
Simple ways to mitigate data quality issues happening
In reality, you will never be able to define 100% data quality, as the definition of quality data is different between use cases. Keeping the eye on the ball: The core purpose after all is not really to provide high quality data, the core purpose is to secure the business processes create the kind of data that enables intelligent, efficient decision making and automation without bottlenecks. There are bound to be issues as use cases evolve over time, but when properly controlled the quality can be always kept to sufficient level and issues detected early enough. The more automated data utilization there is, the more critical the data quality management in the front line becomes!
2 - Lacking access to useful data
This might be caused by several reasons: there might be restricted access to certain data, without proper reasoning why it is restricted (maybe some historical or organization limitations that are really not justifiable anymore), there might be actual technical limitations to get the data from the source for further use outside the system, or simply no-one has created the access to the data in a form that is useful for the end user.
When data is provided to the users from the source system (where it is created) there is business logic embedded to it in the system's user interface. This makes the usage of the data possible. However, when extracted from the system, most of the business logic is left behind and only “raw data” is extractable. So when exporting the data to another data platform (such as data warehouse, data lake) or reporting system (such as PowerBI) only the “raw data” is accessible. That is unless it was required, when choosing and developing the IT system, that the data can be also extracted in useful form from the system. This is usually not the case, and the more legacy the organization has the more complex and limited it is to extract data out from the systems in a way that fits today's data analytics requirements.
领英推荐
So even being able to open the gates to the “raw data” most probably it will not provide useful information as-is. Additional business logic needs to be added to make is useful. This means the same business logic is likely to be managed now in several places: in the original source system and in the data sets extracted from that system. As the business logic is likely to change from time to time, also the changes need to be managed in all of these places to keep the package together.
For users the lack of having access to useful data can also mean that they simply do not know what the data is. It might be the right fit for their need but due to lack of documentation and information about the data itself (be it report, result of an analysis, data set etc) they can not really trust to use it.
Simply ways to secure data can be used
3 - Time and money wasted in changing BI tools and technology platforms to try to solve the above two
Having one’s favourite BI tool and using lot of effort in selling it to others as “the tool to use” is surprisingly common. Even people who tend to call themselves “non technical” like to talk about the technical tools rather than the desired business outcomes. “We need to buy Qliksense, because these PowerBI reports are useless. I cannot find anything from here, and the report I have been using is not to be trusted”
When the 2 issues mentioned above occur long enough, the organization naturally tries to solve them, usually by choosing a new tool. So the focus of thinking and taking actions shifts from complaining about bad quality and/or missing access to useful data to how the data is served to the users. This is of course understandable, because that is the interface for most of the user to get their hands on the data, and where they see it.
Sometimes switching the tool might actually provide short term solution to the problem. Not because the new tool is better, but because when there is an approval to take the time to implement a new tool ,it means there is also approval to use time to define what the information should look like, and time to test the data quality. There is most probably also approval to use time to train the users and help them get onboard to use the data. For a short while it might seem that the problem was solved but if no proper effort was put on actually implementing changes to the process creating the data and/or the ability of the new users also to understand how the information should be used, the situation will sooner or later return back to the starting point.
The downside of using the "implementing new tool" -approach to fix the data quality and usability issues, even if that provides the critical buy-in from C-level, is that usually the promise from vendors is that "it takes only few hours to set up the new tool". For the technical platform that is for sure true (hopefully ;-)) but the secret lays in the data streaming from the source systems, created based on how the business processes function, and the users ability to understand what this data tells them and how they should interpret it in their decision making.
The reality is that the data the organization has is fully reflecting the existing business processes, how they are operated and how they function, and how well the existing IT systems support those. When the first two points are sustainably solved, there is no need to focus on changing the technical tools to fix the core problems. It is of course natural that an easy to use, intuitive tools are needed, but as critical it is to understand that they only show and give access to what data is available, they do not fix the quality issues or make the raw data more understandable. Yet still a huge amount of time is used in organizations on debating which tool is the best one to use ;-)
It is finally time to move from outsourcing the data management to data experts solely and build a model of distributed accountability and control to all who impact the generation of data in the organization. And if your organization feels more comfortable starting with a tool instead, I warmly recommend starting with a data catalog! Some of these catalog tools can help you to make the existing, distributed data sets visible fast, including how they are used, their connection to business terminology, and their characteristics (such as missing fields, duplicates etc). This is a much better angle to start with getting your data into better control and use, instead of changing your BI tool.
?
Manage & Secure Windows, SQL servers | Azure Arc | Cloud enablement
3 年Thank you for writing this article. Very insightful.
Creativity & Innovation | I inspire entrepreneurs to grow their business with a creative and goal oriented mind
3 年Very true and great story Minna??