Data quality is a process

Data quality is a process

‘We would like to implement a project that will improve data quality’—these are words I have heard many times recently. On one hand, they raise hope because we are becoming increasingly aware of the importance of ensuring high data quality and the enormous impact this factor has on all other IT projects we carry out in our insurance company. On the other hand, there is a certain danger in this statement. A danger that could render all our efforts futile. Why?

Because data quality cannot be ensured through a project. Data quality is a process that must continue day by day, month by month, year by year. There is no other way. Data quality cannot be guaranteed once and for all. For the data quality process to fulfill its function, it must consist of four steps.

Analysis

First, the data must be profiled, meaning we need to assess their current state. For this purpose, we typically use statistical tools that show the contents of our databases. Let's assume we want to profile the "postal code" field. The profiling tool will show us that this is a text field, six characters long, and that most data follow the pattern: "2 digits, a hyphen, 3 digits." However, there are deviations from this pattern: most data are indeed text, but some values lack the hyphen and are therefore actually numeric values, etc. After such an analysis, we intuitively sense that some data in our "postal code" field do not meet the proper quality standards.

To remove the word "intuitively" from the above statement, we need to perform the second step, which is to describe the expected data quality. Sticking with our example, we could describe that we expect the "postal code" field to contain only six-character entries following the pattern: "2 digits, a hyphen, 3 digits," which should be included in a dictionary of existing postal codes provided from an external source.

Action

Describing data quality in this way allows us to begin the process of data improvement. Here, two aspects must be distinguished. The first involves improving data at the source. Using the postal code example—if, in one of the source systems, the "postal code" field allows for the entry of values that do not conform to our expected pattern, additional validation should be added. The second aspect is improving data in the processes of their handling in systems such as data warehouses or data lakes. This is an important and desirable process, but its effectiveness should not be overestimated. The key in this second process is the ability to automatically mark whether and to what extent a given value meets the expected quality conditions.

Such marking allows us to undertake the final stage of maintaining data quality, which is monitoring. For this purpose, we build reports measuring the quality of individual objects and fields divided by data areas, their priorities, source systems, or methods of use. Thanks to these reports, we can both monitor data quality over time and identify the weakest areas or source systems and focus our quality improvement process on them.

Cyclicity

Such a constructed data quality improvement process in the company allows for very good results and provides end users with the data quality they need. There is one condition. We must perform all these actions cyclically so that maintaining data quality becomes part of the DNA of our organization.


Article published in Insurance Weekly No. 29/2023

Ela Bator

Data Manager | Data Digital Transformation | Data Governance | Data Analytics | Scrum Master & AgilePM? Certified Practitioner l Power BI | SQL | Databricks

4 个月

?ukasz Nienartowicz Insightful points. We incorporate the data quality in three high-level steps: Measure, Improve & Monitor, each consisting of multiple actions. One thing I find critical is ensuring a ‘get clean, stay clean’ approach to prevent recurring errors in our data.

Marcin Banaszkiewicz

Trener LinkedIn ???? Szczyt H2H ?? Twórca Spo?eczno?ci H2H ?? Topowy G?os Polskiego LinkedIn ?? Wyk?adowca Akademicki ?? Producent Niezapomnianych Eventów ?? CEO my logo

4 个月

Polecam zapoznanie si? z artyku?em ?ukasz Nienartowicz ??

Filip Iwanski

Ask me for digital strategy ?? AI ?? digital advisory ?? AI ??mobile apps ?? design processes ?? cybersecurity ??

4 个月

Data quality is a process —?najwa?niejsza teza, z któr? si? zgadzam

要查看或添加评论,请登录

?ukasz Nienartowicz的更多文章

  • Jako?? danych to proces

    Jako?? danych to proces

    ?Chcieliby?my zrealizowa? projekt, który poprawi jako?? danych” – to s?owa, które w ostatnim czasie s?ysza?em…

    6 条评论
  • How much is the data worth?

    How much is the data worth?

    Data is the new oil. Data is the new gold.

  • Ile warte s? dane?

    Ile warte s? dane?

    Dane to nowa ropa naftowa. Dane to nowe z?oto.

    2 条评论
  • Data quality is a team game

    Data quality is a team game

    What is worse than not including data in decision-making processes? Including the wrong data in these processes. Since…

    6 条评论
  • Jako?? danych to gra zespo?owa

    Jako?? danych to gra zespo?owa

    Co jest gorszego od nieuwzgl?dniania danych w procesach decyzyjnych? Uwzgl?dnianie w nich b??dnych danych. Skoro ju?…

    10 条评论
  • And do you have good documentation?

    And do you have good documentation?

    Building an information system is a long and complicated process. Assessing the quality of an already built system is…

  • A czy Ty masz dobr? dokumentacj??

    A czy Ty masz dobr? dokumentacj??

    Budowa systemu informatycznego to d?ugi i skomplikowany proces. Równie trudna jest ocena jako?ci zbudowanego ju?…

    19 条评论
  • Do you have to be the Sherlock Holmes of data?

    Do you have to be the Sherlock Holmes of data?

    I very much enjoy reading classic detective novels. Together with characters such as Sherlock Holmes, Hercule Poirot or…

    2 条评论
  • Czy musisz by? Sherlockiem Holmesem danych?

    Czy musisz by? Sherlockiem Holmesem danych?

    Bardzo lubi? czyta? klasyczne powie?ci detektywistyczne. Wspólnie z bohaterami, takimi jak Sherlock Holmes, Hercule…

    12 条评论
  • A chatbot that won't annoy customers - is it possible?

    A chatbot that won't annoy customers - is it possible?

    A few years ago, when I did my online shopping, I encountered an unpleasant situation. The employee of the courier…

    2 条评论

社区洞察

其他会员也浏览了