Data Quality is not about the “Data”  alone
Microsoft office stock image

Data Quality is not about the “Data” alone

Data Quality topic has always been on radar of various key stake holders and “C’ community in the organizations. Invariably we hear a statement from Business leaders that they were not able to drive their business agenda due to poor data quality. This is a very simple and legitimate statement. Indeed, the Business agenda realization is heavily dependent on the value of insights that are brought forward to drive the business. The value of insights is directly proportional to the quality of the data that enables insights.

Does it mean that culprit is the “Data” residing in IT systems? Does it mean that focusing on IT systems and processes would improve the quality of the Data? Answer is partially correct. “Data” word is often associated with IT systems, databases, and data storage platforms. Significant component that impacts the “Data” quality is not IT but rather business elements and processes. In reality, every “Data” that exists in IT systems should be a reflection of business elements. Organization that considers the “Data Quality” as an IT challenge to solve, can achieve limited returns through their Data Quality remediation initiatives. Also more likely, in those cases, the “Data Quality” initiative loses the steam midair!!.

Data Quality has been in the industry for many decades. At the beginning of Digital age, where business processes were being digitized, the center of attention was grabbed by the “Data” in IT systems. Benefits like ease of executing business processes using digitized system has been huge for organizations. It is still a significant enabler for business. Multiple advancements were done in technologies related to data storage and management. In parallel organizations built their IT landscape all the way from operational systems to reporting systems and Analytical platforms. For many, it become a maze without clear visibility of “data” origin. It is natural, for stakeholders to strongly think of “IT systems” when “Data Quality” is being considered. With the flood of data, it becomes super foggy for the stakeholders to realize that “Data” is a business phenomenon and not an IT origin, though IT enables data storage, data management and data lifecycle.

Though well known, “Data Quality” is mostly taken as granted. In this article, we will look at the key aspects that need to be considered while initiating “Data Quality” work including “Data Quality Framework DQF”

When we look deeper into the situation, “data” does not exist without associated business elements like real Customers, Suppliers, Products, Regulations, Audits, Orders and many more. Data is ideally a direct reflection of business operations. Data quality challenges would significantly disappear when “data” truly reflect business imperatives. Very fact that we live in a less-than-perfect world, less likely the “data” is going to accurately reflect business realities. This is one of the key reasons of bad data quality which probably have influence from both, IT world and Business world. The other prominent aspect that promotes bad data quality, is the impurities and poorly aligned business processes. This is the true origin of data quality challenges. Data quality is less about the “Data” as those are only the symptoms and not the real cause. Addressing the data quality challenge through IT lens is equivalent to treating the symptom that may disappear temporary. But it is bound to reappear unless the real cause is treated. Data Quality is not about the “Data” alone in IT parlance, but more about the reflection of Business realities and Business processes.

Very often “Data” attracts the attention in context of bad data quality challenges, because all the impurities in business processes misalignment culminates and become visible in “Data” in IT. This leads to hallucination as “Data Quality” is IT aspect.

No alt text provided for this image

Source of the background image: Adobe stock image

Tickling the dragon's tail. Data quality is a sleeping dragon. Any sort of trigger to wake up the dragon, results in nuclear chain reaction and bad data quality gets deeply rooted, in no amount of time, into every part of the business organizations. This transforms into “Elephant's Foot”, where everyone is scared to go close!!. Unfortunately, at every step of business operations, stakeholders step onto dragons tail and trigger the chain reaction. For example, Retail bank executive would onboard prospect customer with new “Customer IDs” not realizing that the same customer has existing relationship with another Credit line business unit. Mobile service provider kiosk would onboard new customer without capturing alternate contact numbers of the Customer which invariably end up in series of 9!. There are many such instances everywhere across the industry. Although it looks very trivial, but it results in tickling the dragon’s tail. This trivial business reality (in this case a correct phone number) does not get reflected in “Data” correctly and further it decimates in all part of the organizational systems.

In telecommunications organization, sales and store members would consider “Customer” as one who is an existing customer or who would become a customer in future. The IT systems would have corresponding precipitation of “Customer Data”. However, on the other hand, support staff would consider same “Customer Data” as existing customers only (not including future customers). This misaligned business process results in tickling the dragon tail powered by “Contextual conflict”.?

Quick Fix or Permanent Fix: It is deceiving to organizations, making them think to fix the data quality challenge in IT systems and data hubs. It is easy to have a quick fix, like establish another data hub, cleanse the data and consume it for the purpose. This would suppress the symptom of data quality challenge temporarily only till the point, business scenario changes or new use case of data consumption comes up, which is more likely to happen. Quick fix in above example, is to create another data hop, filter out future customers and consume it for specific desired purpose of customer support staff.

Permanent Fix is not easy. There is no pain, no gain. Long term sustainable solution involves not only IT aspects but also to see how close business realities are reflected in “Data” and rectifying the misalignment within business processes. Permanent fix starts from alignment of business processes, alignment of business context and promoting data driven culture along with many other aspects that we would look later. In above example, customer “data” context needs to be defined clearly and socialized with stakeholders, along with appropriate classification of existing and future customer embedded in business process itself. Various such aspects that go into appropriate Data Quality Framework solution are explained later in this article.

Twelve things to consider: While initiating the Data Quality program, few things are important to be considered. Only few key aspects are listed here, however there are many more things to add up to this list.

No alt text provided for this image

Source of the background image: Microsoft office stock image

  1. Top-down or Bottom-up approach: Data has omnipresence in the organization and hence Data Quality. Organizations stumble across the question “Where to start?”. Top-down is the ideal approach but practically it takes ever for organizations to cover all the ground and loses the steam of Data Quality initiative in the journey. Bottom-up approach gives quick results but end up with high risk being localized. Practical approach would be the blend of both, strategic and foundational aspects (described later in Data Quality Framework) in Top-Down mode and use bottom-up approach to rollout Data Quality Framework to one data domain, one function and one business unit at a time in incremental way.?
  2. Data Quality initiative scope: How to pick correct homogeneous scope to incrementally roll out Data Quality Framework (DQF)? DQF would provide best results when rolled out primarily based on data domain. The fundamental reason being the business context is associated with a data domain as primary entity. Further, club the homogeneous business context of sub-data domains together. For example, “size” of a refrigerator unit to be sold in Retail shop is described in “liters” whereas size of shirts may be described as Small-Medium-Large. For customer data domain, “address” would represent a shipping or billing address. Whereas for product data domain, address would be a manufacturing origin address or supplier warehouse address. The business context behind “size” and “address” changes based on data domain and sub-data domains under consideration.?Organizations may select the scope based on specific localized unit instead of slicing based on data domains. However, this may not produce the desired results unless holistic impact of data domain is considered. For example, holistic (rather than localized) context of Customer data domain needs to be considered while executing data quality initiative for marketing unit alone. Customer data domain is more likely to originate outside of marketing unit and contributed by various other units to define the business context behind it. Personalized (segment of one) marketing approach for Retailer is based on insight from POS, Social insights and buying patterns. These other contributing units/sources plays a major role in defining the context rather than marketing unit itself.
  3. Consolidation or Federated approach: Traditionally data quality solutions are deployed in the flow of data pipeline from various sources (internal and external system) to all the way up reporting platform. It is well known, to consolidate the data at central place, cleanse it and then consume for Reporting purposes primarily. Advantage of this “Consolidation” approach is to produce quick results. However, this approach has inherent challenge that data quality does not improve from origin and bad data continue to exist in the organization. Federated approach is ideal way to fix the data quality challenges at the origin. However, it needs a systematic approach and should not be considered as quick solution. Federated approach is based on two principals, first is to fix data quality at origin and second to encourage enterprise-wide business context rather than localized one. This would automatically align enterprise-wide data consuming use cases to right context. However, it involves inherent challenge of inertia due to lack of data driven culture and lack of data ownership in the organization. This makes organization to chose easy route of “consolidation”. Hybrid approach is a practical way, where reap the benefits of quick return of Consolidation approach and at the same time systematically improve the Data Quality at source through strategic Federated approach.
  4. Is independent Data Quality program sustainable: Data quality strategic and foundational aspect (Data Quality Framework) are to be driven from business objectives of the organization and need to be initiated under executive sponsorship. This is initiated of its own under executive support. Whereas the bottom-up aspect of rolling out data quality, gives the best result when clubbed with other localized initiatives having specific business outcome expectations. It enables high acceptance of data quality activities and also enables to prove the benefits of data quality initiative to organizational stakeholders. For instance, value of data quality rollout for Marketing unit is most appreciated when clubbed with specific other large marketing (campaign) initiatives.
  5. Where to measure the quality of Data?: The first ideal place is where the data makes most impact on business. Typically, at Data Warehouse, reporting and Analytical workbench platforms. Another place should be at where business makes impact on data, i.e. at the origin of the “Data” (and also major data authoring places). Extra consideration should be paid to ensure that business rules that are used to check the quality of the data remains close to business context. Third potential place is where considerable data transformations are executed, i.e. typically when “Data” is in transit. Fourth place is on the periphery of key Systems. Both third and fourth cases are risky places to measure quality of data as traditionally Data transformation project are localized and hence Systems with business context heavily influenced by local demands. For example, Customer service System will have 9 or 10 digits as phone number in USA and similar System catering to Customers in India will have 10 or 12 digit phone number. Manufacturers at different parts of the world use either Lbs or Kg as unit of weight measure.
  6. Should data be Cleanse at all the place where data quality is measured (DQ gates)?: Answer is No, rather not advisable. ?Data quality measurement is to make stakeholders aware as how good or bad the “Data” is from data consumption perspective with various Business process and Systems. Very often, data consumers would neither have knowledge of business context behind the data, nor they would have authority to take decision about the same. Data cleansing ideally should happen at the origin/source (and primary authoring places) of the data. Consolidation and Federated approaches are discussed in this article.
  7. How to move a Mountain of Data Quality, it is overwhelming !!: Data quality is always overwhelming for stakeholders. Actions of individual stakeholders toward improving data quality may not necessarily benefit their own specific unit. Rather efforts of data quality improvement of one unit would most benefit the other unit in the organization. For instance, correctly capturing the Customer demographic data by the store representative of mobile service provider, significantly help down the line for technical support, billing, and marketing units in performing their operations. If there is nothing for me in return, why I should invest? This myopic view, though it is a very natural phenomenon, becomes fundamental roadblock. Why should Product receiving manager at Retail logistics department, spend extra minutes to precisely measure the dimension of the Product and weight. However, if he does not do it, later down the line it could become as a serious safety concern in store shelf operations. Organizations need to develop a reward mechanism to encourage and promote Data quality actions and become data driven evangelist. Data quality initiate is more about “giving” and less about “gaining” for individual. Well aligned data quality goals alone do not provide necessary impact without associated reward mechanism.?
  8. I would worry about Cloud data quality alone: This is a legitimate statement if data is encapsulated completely in Cloud platform. However, co-existence of on-premises landscape with Cloud, demands a careful consideration of data domain at enterprise-wide level. Engineering design, originating in on-premises PLM system, impacts significantly on the effectiveness of the “Product catalogs” derived out of Cloud based omni channel systems. Cloud is bound to provide inherent benefits as long as we do not make it a “Data Island”.
  9. Are standard Data quality KPIs (like Correctness, Completeness and more) are sufficient: This is the symptoms of “Data Quality” program considered as an IT project. Standard data quality KPIs are important and provides insight into progress of quality of data as compared to defined standards. However, it does not necessarily reflect the progress of the positive impact on business outcome.?For instance, maintaining accurate Product data does not get positive impetus unless inventory cost optimization is measured for Airline industry. Therefore business outcome linked KPIs are critical.
  10. Completed the “Data Quality” program !!: While the statement may be true for specific actions perspective, however the “Data Quality” is never said to be completely done. The moment data is cleansed, the contamination process starts. “Data Quality” is a continuous phenomenon. Product pricing changes on continuous basis according to competition. New customer acquisition and customer churn happens every day for telecommunications industry. Quality of the data need to be continuously measured, identify impact on business KPIs and do course correction on continues basis. Data Quality programs is naturally to be driven through Data Governance organization structure.
  11. Differentiate among Application/Use case-level Vs Enterprise-level Data Quality: All data quality efforts are worth, though whatever trivial those would be. One can perform data quality activities for specific set of Application or specific set of use case like data migration. Such Application-Use case specific data quality activities should leverage the enterprise-wide DQF. However, if DQF does not exist, then such Application/Use case level activities need to take extra steps to consider enterprise-wide data domain context. However, it would justify the return of significant efforts of data quality program, only when it impacts the wider community across the enterprise. Therefore, a careful selection is necessary for the specific data domains, business processes impacting the business in a big way. Only such initiatives can be considered as Enterprise-Wide efforts and need to be brought under the preview of executive sponsorship.
  12. Once Data Quality rules are defined, it remains constant: ?No, on the contrary, data quality rule are volatile. These are direct reflection of business realities. As business environment and priorities changes, data quality rules need to be regularly vetted against it. Remember data quality is relative. It does not have intrinsic characteristics unless data quality is measured against a reference. Advent of GDPR regulation significantly impacted businesses and hence data quality paradigm as well. It is no longer a race of making data available, but rather how to restrict the access with proper classification of privacy data. All of a sudden, data privacy classification data quality rules came into operations.

Data Quality Framework: It is a systematic and orchestrated mechanism in tandem with systems, people, and business processes aligned with Business objectives.

No alt text provided for this image

Source of the background image: Microsoft office stock image

Data Quality Framework (DQF) should address key consideration:

  1. Alignment with Business Objectives: Business objectives are realized through Business capabilities. Business capabilities deal with Business elements and realities which reflects in “Data”. Prioritization of business objectives and associated business capabilities enables systematic evolution of data quality. ?Ensure that DQF clearly articulate alignment with business objectives
  2. Data domains (reflection of Business realities): Identification of data domains that influence most on the business outcome is critical. It is necessary to focus the energy on the data that produces most impact. DQF need to define the Data domains and priority to be considered.
  3. Context definition of Data domains: data quality is a relative phenomenon. Bad data for one can be absolutely perfect for other. Quality of the data is determined based on business context under consideration. Business context behind each data element need to be clearly defined and differentiated among various consumer use cases. ?DQF need to demonstrate the approach to maintain Business context current. Typically need to link to Metadata management initiative under Data Governance program.?
  4. Business Elements (Data) ownership: Data quality initiatives does not work without the clear ownership of the “data”. Ownership is not in context of IT but it should in context of Business. DQF need to clearly demonstrate inclusion of Data ownership responsibilities. Need to leverage Data Governance organization structure.
  5. Mechanism to measure Business impact: Data quality is a business initiative. It does not evolve in right direction without a mechanism to measure the progress and impact on Business. DQF need to define business KPIs and do the mapping with Data quality actions.
  6. Business processes precipitating the Data: Business capabilities are realized through Business processes. These Business processes precipitates “Data” through various systems. Identify and map the associated business processes corresponding to prioritized data domains. Identify the inherent misalignment of business processes to reduce the business “context conflict” and hence reduction in data quality challenges. DQF need to demonstrate approach to contentiously detecting and rectifying potential misalignment (“Context Conflict”) within business processes. This needs to be executed in close coordination with Data Governance organization.
  7. Systems of Data capture: Business processes precipitates the data through Systems (data capture and authoring). Identify and map associated System of data capture and associated Data domains, to rectify the potential causes that are causing bad data at source. For examples, Systems of capture should not pretend to capture the data that they are not able to reliably capture, in the background of associates Business processes. It is a wrong tendency to make as many attributes as possible mandatory while capturing the data. DQF need to clearly define Systems of Data capture and an approach to discover new Systems that would come along the way from future scalability perspective.
  8. Systems of Data reference: Data travels from Systems of Capture to Systems of reference which provides a reference point for the data consumers. Throughout the journey of data travel to Systems of reference, associated Business and IT processes transforms the data. This could potentially inject the bad data. DQF need to define Systems of reference and mapped Data domains. DQF also need to define an approach to discover new Systems of reference
  9. Data Quality rules discovery: Data quality rules are directly dependent on business context. Rules need to be defined in order to detect the deviation of “Data” from business context. As business context changes (although slightly) continuously in alignment with ever changing business environment, data quality rules need to be discovered and rediscovered on continuous basis. A critical care should be exercised to ensure that Business rules remain as close as to business context. Otherwise, the preexisting business rules itself becomes the source bad data quality. DQF need to define an approach of discovering data quality rules and maintaining it always aligned with business realities.
  10. Data Quality gates: Define at what points in the “Data” pipeline, data quality gates need to be established. It is a natural tendency and bad practice to establish such gates as all possible places in pipeline. However, avoid getting into such trap. Establish data quality gates only at the places (data consumption) where it is going to have most impact on business outcome. Otherwise, it would otherwise over complicate the DQF. DQF should clearly define such data quality gates.
  11. Mechanism to discover data quality challenges: Define an approach to discover data quality challenges. It would involve primary data profiling and advanced data profiling to discover outliers cases and identify root cause as well. ?DQF need to define the process of discovering the data quality challenge, performing classification and bucketing of quality challenges and routing those to data owners and stewards.
  12. Mechanism to cleanse the data quality challenges: Cleaning the data is as difficult as discovering the data quality. Data quality remediation could involve single or combination of more than one aspect in the area of People actions, Business Process, IT Technology. At the same time the data quality remediation could reside external to the organizational boundaries. DQF should clear define an approach of defining the remediation routes (People, Business Process, Technology) and also a approach to orchestrate the data remediation actions.
  13. Data Literacy and Data conscious culture: “The most benefit would be for those who aspire to comply with the rules”. If organizational culture is not data driven, no matter whatever sophisticated mechanisms you would deploy for data quality, it is bound to fail. Organizational stake holders should be completely aware of the wealth of “Data” that they have with precise business context definition behind each of the data elements. They should also be aware of what “Data” source is a reliable for reference. DQF should define approach to keep track and work together with Data Governance activities towards enhancing Data culture in the organization.
  14. Reward mechanism: Link Reward to Business outcome to data quality business KPI to specific data domains to set of functional units and to set of specific individuals. It would never be a straightforward simple mathematical formula. ?DQF need to define this Reward mechanism and perform governance of the same.
  15. Technology: Plenty of data quality technical tools are available for data profiling and cleansing, ticketing and remediation process orchestration, Data lineage , Business and Technical Metadata and many more technologies in associated areas. Select appropriate technology that is fit for purpose giving optimal results and interoperable to wider landscape. Highly loaded technology options may become annoying baggage that one need to carry throughout the journey. Careful considerations should be given towards on-premises-Cloud-Hybrid Cloud aspects and the level of automation and intelligence that can be brought in using AI-ML techniques. It should have a balanced mix of effectiveness and simplicity. DQF need to define overall policies and guidelines for selecting the specific tools. Also, DQF could further assist in finalization of tool and defining associated technology rollout roadmap at enterprise.
  16. Data Governance Body: This has wider scope than data quality alone. However, Data Governance is paramount for sustainable Data Quality initiative. DQF need to define linkage and interdependency of Data Quality initiate with Data Governance framework and associated organization structure.

Summary: Data Quality is not about “Data” alone in IT language. Very often Data Quality aspects are taken as granted and assumed to be a trivial IT activity. Rather it is a business initiative aligned with business objectives and processes enabled by IT. Wherever Data Quality initiative is considered as IT initiative alone, it is bound to fail as it cannot sustain relevance to business outcome. At TCS, Analytics & Insight unit, we take holistic approach towards Data Quality, based on D3 strategy.?Organizations should provide due considerations to various critical aspects (Top-down or Bottom-up approach, Consolidated or Federated mechanism, Data Quality Business KPIs and many more) while initiating the Data Quality program. Data Quality Framework DQF need to be established with executive sponsorship and should consist of various key foundational aspects (like alignment with Business Objectives, Data Governance alignment, Data Quality gates approach, Data Literacy, Reward mechanism and many more) that provides solid foundation for enterprise-wide data quality.

Sumit Mukherjee

Data Management , ETL SME, Collibra Certified Ranger

3 年

Excellently covered Ajay Sir

Sudipto Mitra

Data & Analytics Leader at EY

3 年

Thank You Ajay ! This is the most comprehensive article on Data Quality in light of business context

Subhagata Swarnakar

Sr. Consultant at Tata Consultancy Services

3 年

Excellent. Very informative article. Thanks for sharing Ajay Vaidya.

Antony Vivek Prasanna

Digital Transformation | SAP S/4HANA | Data Architect & Strategy

3 年

Great Article Ajay Vaidya

Yogesh Dhond

Data Governance, Data Quality, Data Catalog & MDM Consultant @TCS

3 年

May I share this article?

要查看或添加评论,请登录

Ajay Vaidya的更多文章

  • Trust with trustless - Blockchain

    Trust with trustless - Blockchain

    Trust is a critical ingredient for the Business success. Trust needs to be there among various stakeholders that…

    1 条评论
  • Do not Trust AI?

    Do not Trust AI?

    Explainable AI is being discussed across the industry. Are we (human) not trusting Machine? We accept tolerance for…

    4 条评论
  • Neural networks are like human learning riding a bicycle

    Neural networks are like human learning riding a bicycle

    Frank Rosenblatt was fascinated the way human brain works to identify visual data. Warren McCulloch and Walter Pitts…

  • Is Machine Learning same as Machine driven Automation?

    Is Machine Learning same as Machine driven Automation?

    Buzzwords like “Analytics” and “Machine Learning” are very much trending in all sectors of the industry. IT solution…

  • Adaptive Data Quality with Cognitive Computing

    Adaptive Data Quality with Cognitive Computing

    There is so much to talk about Data Quality and has already been researched thoroughly by many professional…

  • MDM Bridging the World

    MDM Bridging the World

    By this time, Master Data Management is well known in Business and IT communities. Now for every Digital…

  • Object Orientation default for Digital

    Object Orientation default for Digital

    We are living in the world where everything is moving towards becoming “smart”. In everyday operations, smart phone…

社区洞察

其他会员也浏览了