Poor data quality? It's simple to solve...
You read it everywhere:
- Data is the new gold, oil; you name it. (BTW: I don't think comparing data with any raw material holds because data is not consumed but generated en masse these days)
- Without useful data, companies are doomed to fail.
Good data quality is not an end in itself. We think it's the foundation to use the possibilities of the 21st century like advanced data-analytic tools to gain new insights you won't get otherwise.
Having worked for over ten years with many organizations to analyze critical purchasing data, I have seen how poor data quality issues have a direct and real impact on business performance and cash-flow.
Those being smart about their data can better understand their business and act faster in any aspect. Being smarter, better, faster will separate the innovators from the rest, the followers.
On the other hand, what we continuously see and hear for mostly all non-financial data:
Our data quality is so low.
And companies do a lot about trying to raise the data quality:
- Invest in more tools
- Hire Data Managers
- Build up big data-lakes, seas, galaxies, etc.
- Setup AI, ML, RAP, ... projects
But still, we see and hear:
Our data quality is so low.
Something must be fundamentally wrong in those common approaches to getting better data quality implemented.
The cause of these problems and how they can be solved is clear. Some readers may find this insight a little provocative as it hits at the heart of the problems. The interesting thing is, all these problems are mostly generic and entirely independent of the data or company.
Hence, I think the solution is universal too.
If you take a look around your organization, the chances are high that you see precisely the described situation.
The root causes of low data quality
In my opinion, the lack of understanding of the implications that the information pieces, which become data by storing them for further usage, don't fall from the sky but are showing up at different places during the routine business operations is the root cause.
Companies have business processes, governance rules, functional structures, organizational structures, etc. But have you ever seen a data-/information-flow chart for your company?
A company is a data-generating process, a big data-flow network, and a data transformation engine. Yet business processes, organizational structures, and even IT systems don't take this data-generating and data-flow aspect really into account.
You can see it everywhere. You query the same information in two different IT systems and get two different answers, and you have to use awkward workarounds to get your job done; Excel is used to fill the gaps everywhere, etc.
And I don't mean that your IT department didn't do their job, even if IT systems are in place, which can handle all this, high data quality is lacking.
I see three primary root causes.
1. Not my business
The person where an information piece materializes, which needs to be captured because this information can't be derived from anything else later, has no benefit from using this information.
Why should I capture the data,
I don't need because I'm not using it !
Poor data quality results from either entering no data at all (most likely case) or entering something quick & dirty to move on if forced to capture the information somehow.
You see such effects and implications anywhere. Just try to get an answer for pretty simple questions like Show me all of your plastic parts that have a specific surface coating sorted by quantity over the past 12 months?
Or, you can dig into any purchasing commodity: Some parts don't belong there, prices for parts are zero or negative, and if you take a look at free-text entries, you are most likely surprised by how many different variants it is possible to write down a specific steel quality.
It's a big mess.
2. No Time, No Capacity = No Priority
The person where an information piece materializes doesn't see data quality as part of the job or adding value.
We have so much to do and capturing all the data is slowing us down.
Again, the poor data quality results from entering no data or entering something quick & dirty to move on. It's even worse because identifying We don't have any data is simple, whereas finding out This data is not correct can be very time-consuming.
Worse still, if the error is not recognized and only shows up much later, if at all.
3. We run a big project that solves all data quality problems
The person where an information piece materializes hopes that someone else is taking over the responsibility and will no longer be part of the game.
Our data quality is so low only my colleagues creating this silver bullet solution will be able to solve out data problems.
Such a statement is just a nicer variance of Not My Business.
And let's face it, many people don't have any motivation to keep an eye on producing high-quality data and information from useful data. It's tedious, boring, and maybe the IT tools are nasty too. All this is just a significant burden.
The more personal effort it takes, which it does until the data-flow network is running & optimized, the lower is the motivation to take care. It's a vicious circle, and things get worse over time.
The company is building up a big data & information debt. And the interest rate of that debt is high, very high. Companies don't get a chance to stop the engine and clean-up the data mess before moving on. Companies have to do it while keeping running. Making things even more complicated.
Why do these root causes exist?
- It seems people don't understand what the data is used for down the road. People are not aware of the time-benefit ratio of capturing data vs. consuming data, which I would guess is on average at least 1:10. Every data captured is used ten times. Ignoring the data capture task or doing it sloppily to save 30-60 seconds will create hassles for nine others down the road who want to use the data. And each has to spend 10 minutes to identify a problem, hunt down the correct information, and fix it to move on. So, you have a 900 to 1800 times higher negative effect on the organization than when the person in charge would have captured the data correctly.
- It seems that many people are not aware that data has a life-time and life-cycle. Some data lasts for short times (transactional data), while other data stays relevant for very long (product properties or features). Long-lived data is worth to be dealt with correctly in the first place. It's the core information a company runs on.
- Management only talks about data quality improvement but doesn't care nor has a focus on it. This behavior translates to Data Quality has no priority; it's something disturbing. No wonder does the organization sits and waits while hoping that someone else will take care.
- The used KPIs, personal goals, objective measurements are counterproductive. There is just no incentive for good data quality.
All this isn't a very sexy setup for producing good data quality.
The path to good data quality
First, there is no silver bullet. Second, it's much work, and third, it's a long way. I even think it is a continuous improvement process.
This means an idea like Let's solve the data quality problem once and for all with a one-time project won't work.
Understanding the Data Generation Process
Data and the derived information is not an ad-hoc business. There is a process, a flow of how, when, and where data materializes and is used. Having a clear picture of this network, which will most likely be very different from your organizational structure, is mandatory to making things better step-by-step.
Putting it on everyone's agenda and incentive plan
I think the quality of data-generating and capturing tasks must be measured and become a personal goal & incentive. Besides showing actual use-cases that need good data quality and explaining why this is crucial for its success, all people who are part of a data-generating process must be personally committed.
It is pretty easy to derive personal incentives from data quality improvement actions. Aggregating these up the hierarchy would give managers an incentive as well. The more data quality problems are fixed, the higher the personal incentive will be.
Those, that trigger & push a heuristic implementation could benefit in that all automatically corrected data is logged for their personal incentive. You could even drive such an incentive-system using an incentive blockchain with a smart-contract, where everyone can see the accumulated incentives so far. When you make it public, everyone can see that personal engagement will pay-off.
Being extremely consequent and never surrender
Keeping data quality high is a never-ending marathon. You are never done; it's never over. This means all involved people need to be remembered why it makes sense, and those that are not involved need to understand that data quality takes time and resources, which might make things now more complicated for them. Shortcuts are only useful once; the negative effect comes later.
The effort to keep high data quality might pay-off a lot later. People in charge will have to justify the effort because they might not have something to show for questions like Ok, we do this now for six months, what did we save, gain, etc.? Hence, it is always good to have a set of counter-examples of the form. The data X was consumed Y times in the last Z months. Assuming we are stopping now and getting in 10% inaccurate data, this would mean...
Build up a Data Quality Guarantee Team
The job of such a team is to drive the continuous improvement process. Not necessarily doing it (because then others will lean back) but pushing the focus onto the next thing to improve and monitoring that it will be done.
This group sees every data problem as a fault of the data-generation process, as a fault in a delivered product.
Implement real-time automatic data checks wherever possible
Every identified and focused problem by the Data Quality Guarantee Team should be solved as soon as possible and guarded by an automatic heuristic check. It's like a unit-test in software development, where you run all the tests after each change to see if anything broke.
These checks should be done as close to the data source as possible. Close means, that location-wise the user gets immediate feedback on the screen that the input might not be correct. And time-wise, problematic data should be reported quickly, not six months later.
Over time a massive set of checks will be created, taking care that data stays in a defined range and showing an alarm when things move in a direction that violates the current checks.
Summary
Achieving good data quality is not rocket science; it's just a lot of work. It's not a single significant thing you need to do, but millions of little things and doing it continuously.
On the other hand, if you start caring about your data quality problems in detail and solve them step-by-step, you will see tremendous data quality improvement in a short time.
Keeping it under control by building up a continually growing set of automatic checks everywhere you identified a problem puts you in a position to quickly identify a changing environment and adapt to it to keep data quality high.
Organizations will be genuinely able to fully grasp their data's hidden opportunities when they understand that caring about data is everybody's job every day, everywhere.
Even with little progress towards better data quality, tools like our NLPP (Non-Linear Performance Pricing) price analysis solution allows our clients to save millions.
There are many more potential savings to be revealed. You just need to make the first step.
Today you see costs. Tomorrow you see value.
3 年Based on some feedback I extended the "Putting it on everyone's agenda and incentive plan " paragraph with some ideas about how an incentive approach could look like.
Controlling Operations
3 年So true