What Data Quality Metrics Should Your Organisation Measure?
Image created with DALL.E 2

What Data Quality Metrics Should Your Organisation Measure?


Introduction

Peter Drucker once said: “If you can't measure it, you can't improve it”.

Amidst the explosion of all things data across the world, we have become obsessed with collecting each and every byte that we can get our hands on. One of my favourite stats to illustrate this was reported by Bernard Marr, who stated that every two days we create as much data as we did since the beginning of time until 2003. Yes, every two days!

Amongst this data hoarding obsession enterprise organisations are drooling over key performance indicators, outcomes and key results, market data feeds, customer sentiment and dynamic dashboards like a child standing in front of a sweetshop.

However, often this rush to get hold of mountains of data comes at the cost of the quality of that data. Perhaps what is more important in today’s world is to measure what matters to your business.

I hold that Drucker’s statement has stood the test of time and it’s imperative, now more than ever, to ensure that your organisation is focusing its decisions on high-quality, trusted data sources as part of its wider business strategy.

In short, data quality is absolutely essential to ensure your business is making the right decisions, at the right time.

But how can your organisation ensure that the data you are using to inform your decision-making processes is of a high calibre?

For anyone who is familiar with the DevOps movement, you’ll no doubt have already read either the Phoenix Project or Accelerate. Both of these books discuss the importance of benchmarking an organisation's ability to deliver software change prior to implementing strategies to accelerate time-to-market.

At a high-level, Accelerate specifically refers to four key metrics that organisations must measure in order to track their maturity target to become a high-performing software organisation. These are:

  • Lead Time

  • Deployment Frequency

  • Change Failure

  • Time to Restore

It is equally important to consider measuring similar metrics when it comes to measuring data quality.

This is needed firstly, to qualify the quality of your data and, secondly, to measure how quickly your organisation can turn data into a game-changing asset.

Naturally, if we can address the quality of data then, generally speaking, it should be easier to index, categorise and search that data, reducing data discovery friction for those people who need access to it most in your organisation.

But, what are these metrics and how do you measure them? Let’s explore data quality metrics firstly that enable organisations to establish a data quality baseline.


Build A Hygiene First Approach to Increase Data Quality

Whichever pathway your organisation adopts for improving the quality of its data, it must be sure to have a way of measuring the effectiveness of its data quality efforts. Otherwise, you’ll be investing time and money in a data quality strategy that may or may not return business value. The following are examples of key metrics that we advocate customers administer in order to measure the quality of their data.

1. The Ratio of Data to Errors

In short, how complete is the volume of your overall data landscape?

Tracking this metric allows organisations to qualify the number of known errors in their data sets. This data point considers areas like missing, incomplete, or incomprehensible entries. This considers the ratio of these data points compared to the size of the overall data set.

Overall, if you find fewer errors whilst the volume of your data increases, typically it is safe to assume that your data quality is improving.

2. Number of Empty Values

Nobody likes an empty field or value, as this indicates that the information is missing or the value was not recorded by the most optimal method! This is typically an easy metric to track: the total number of empty fields is weighted against the total number of fields in the data set. By applying this measurement your organisation should be able to identify whether there is a data quality issue or not.

Once again, you can track the total number of empty values versus the total data volume and see whether this ratio increases or decreases over time. Indeed, it is important to point out that whilst your data may well be complete, it might not be of a high quality. As such, it is key to ensure your organisation deploys data stewards across the business to maintain data integrity and ensure that reference data is maintained and relevant for the business process/application/product it is feeding into.

3. Data Transformation Error Rates

Transforming data can be a painful and arduous process.

Taking it from one data source and converting it into another format can often result in project delays owing to data quality problems. By measuring the total number of data transformations that fail, or take a long time to complete, organisations can ascertain that there is a data quality issue underpinning these challenges.

4. Volume of Dark Data

Dark data is data that can’t be used effectively, often because of data quality, discovery or accessibility problems. Dark data is not to be confused with the dark web!

It’s often defined as data that an organisation continues to collect, even though it cannot cope sufficiently well with the data throughput to process and analyse it effectively. The more dark data an organisation is sitting on, the wider its data quality problems are. Indeed, this may also infer much deeper issues around data governance, architecture, visualisation, and accessibility.

5. Cost of Storage

Whilst storage costs are generally falling with the onset of commodity cloud-based storage services. Many organisations are still not optimally managing their data retention and lifecycle processes to ensure that data is stored in the most cost-efficient manner. For example, data that is not accessed once a year for business reporting purposes does not need to be stored in the same highly available and performant storage method as a real-time data-set for front office trading activities. Conversely, if your organisation is storing data but isn’t using it as frequently as it should be, then this could infer that there is an inherent data quality issue.

6. Data Time-to-Value

Calculating how long it takes your data science team or business intelligence teams to generate results from the data set is another way to measure data quality.

There are a number of factors that can affect this metric. As an example, does your organisation have a self-service BI and analytics capability in place? Or, are the tools and platforms you have in place sufficiently powerful to support your use cases and computational demands? Either way, data quality issues can hinder your organisation's time-to-value when trying to maximise its use of data.

Final Thoughts

Over the course of this blog, we have focused specifically on the quality measurements that we advocate organisations administer across their data landscape. Much like with software code, by addressing the hygiene aspects of their data, enterprise organisations can accelerate time-to-value and ultimately unlock unbounded possibilities with evolving use cases for machine learning and artificial intelligence in their business.

Indeed, there is a wider set of metrics organisations can measure that focus on areas like consumer satisfaction scores (e.g. frequency and volume of use, ease of access, ease of discovery, data lineage etc.). And perhaps most importantly, business performance aligned measurements (e.g. revenue increase, market penetration, the cost to serve, customer churn, lifetime customer value etc).

Ultimately, it is imperative to ensure that your organisation measures what matters. If data quality is a major problem in your business, you must capture a data quality baseline prior to administering any interventions. Whilst reviewing your data quality at a regular frequency can give you insights as to whether data quality is getting better or worse in your organisation’s journey to become data-driven.

Piotr Czarnas

Founder @ DQOps open-source Data Quality platform | Detect any data quality issue and watch for new issues with Data Observability

10 个月

I have the feeling that everybody avoids measuring performance because the numbers could be too low. However, we should look at measurement from another perspective. If we find a measure that is too low, we can improve it and call it our success.

Mark Hobart

I am the infoboss | Search & discovery | Data Compliance | Data Quality | Unstructured data | AI

10 个月

I'm so pleased to see someone making these points. I have to own up, my business (infoboss) develops a data quality and compliance management solution and we have recently implemented a method of scoring data quality and compliance rules and applying a simple RAG colour and % score against each one. But importantly we've also added the ability to associate financial amounts with each rule. For example, the cost to fix, the business impact of the issue etc... adding this financial dimension certainly gets the attention of the C-suite. A good example of this is a missing postcode on a customer address. Without it, the delivery driver could take hours trying to find the address if at all, leading to cost and potential dissatisfaction of the customer. The cost to fix is relatively small in comparison so tracking these types of measures enables cost benefit analysis and prioritisation of issues. Thanks again Ben Saunders for raising this important topic.

回复
Andrew Turner

Board Advisor | Operational Leader | Investor | Founder | Experimenter | Community | ??? Host & ??

10 个月

wow on a roll sir...keep em coming Ben Saunders ??

要查看或添加评论,请登录

Ben Saunders的更多文章

社区洞察

其他会员也浏览了