4 of 7: Deadly Cliché #4 - Data Quality is the Key to AI Success
George Trujillo Jr.
Data and Generative AI Strategist | Data Governance | Customer Success
Introduction
Overused clichés can become stale and important context can be lost, diluting their impact. Watch out for these common pitfalls in analytics, AI and generative AI projects.
Data Trust vs. Data Quality
This article will explore what data quality is and what it can do, what data quality is not and cannot do, and why understanding the difference “matters.” The goal is to highlight the significance of differentiating between data quality and data trust. The importance and absolute criticality of data quality is frequently mentioned in almost every AI-related article and presentation. However, in my 17 years of leading data governance and master data management initiatives, I’ve never heard an analytics or AI leader say they want “more data quality” or “better data quality.” Instead, they consistently emphasize the need for their teams and models to be able to “trust the data.” Data quality and data trust are not the same.
Data trust and data quality are closely related concepts, but understanding the distinction between them is crucial, especially when working with digital transformations, high-value analytic assets, and high-profile AI models. While data quality is necessary for data trust, having high data quality alone does not guarantee that the data is trusted. Data quality initiatives can become so focused on tangible KPIs and metrics that the intangible aspects—like user confidence—can get lost. This mistaken belief—that data quality alone will create trust—feeds into the dangerous cliché that "data quality is the key to AI success." Let’s dive deeper into the concepts of data quality and data trust, and compare the two.
Data Quality
Definition: Data quality refers to the characteristics (or dimensions) of data that make it accurate, complete, consistent, timely, and reliable for its intended use. High-quality data is free from errors, duplicates, and inconsistencies, and is appropriately formatted and available when needed.
Importance:
Key Dimensions of Data Quality:
领英推荐
Common Data Quality Metrics:
Data Quality Scorecards (used to provide an aggregate and detailed view of data quality and health metrics). Examples of metrics include:
Data Trust
Definition: Data trust is the confidence users have in the data being believable, accurate, reliable, and fit for purpose. It goes beyond data quality by including aspects such as data provenance (where the data comes from), transparency (how it was collected and processed), and security (how it is protected).
Importance:
Comparison: Data Quality vs. Data Trust
Conclusion
While data quality is crucial for ensuring that data is accurate and usable, data trust ultimately determines whether that data will be relied upon and acted upon by users. High data quality is a prerequisite for data trust, but trust goes beyond just quality—it involves confidence, believability, transparency, governance, and security. It’s important to maintain a clear distinction between the two. For organizations to fully leverage their data assets, they must grow into a unified data strategy which includes data governance and data quality to build strong data trust among users.
?