登录查看更多内容

Data quality management in the age of AI

Barr Moses

Co-Founder & CEO at Monte Carlo

发布日期: 2024年10月9日

Over the last 12 months, data quality has become THE problem to solve for enterprise data teams—and unsurprisingly, AI is driving the charge.?

As more enterprise teams look to AI as their strategic differentiator, the risks associated with bad data become exponentially greater. At the speed and scale of modern data environments, data teams need advanced data quality methods that can rise to meet these challenges.?

In this week’s edition, I’ll consider three of the most common tactics for managing data quality— monitoring, testing, and observability—and discuss how each can (and will) work themselves out in the age of AI.

Defining our terms—data quality monitoring, data testing, and data observability.

Before we can understand the future of data quality, we need to understand the present. In its simplest terms, you can think of data quality as the problem; testing and monitoring as methods to detect problems; and data observability as a comprehensive approach that combines and extends both methods to actually triage and resolve the problem at scale.

Data testing?

Data testing is a detection method that employs user-defined rules to identify specific known issues within a dataset. Manual data testing can be effective for specific use-cases, but naturally becomes less effective at scale. Moreover, testing can only detect the issues you expect to find, and its visibility is limited to the data itself—not the system or code that’s powering it.

Data quality monitoring

Unlike the one-to-one nature of testing, data quality monitoring is an ongoing solution that continually monitors and identifies anomalies in your data based on user-defined thresholds or machine learning. Benefits include broader coverage for unknown unknowns and the ability to track metrics and discover patterns over time. However, broad monitors can be expensive to apply effectively across a large environment, and still require monitors to be expressed in SQL. Like testing, monitors are also limited to the data itself and don’t support the root-cause process.

Pratibha Kumari J. 5 个月前

How Generative AI Applications Enhance Data Management

Brilworks Software 2 个月前

The Great Data Reshape: How GenAI Will Destroy and…

Srinivasa (Chuck) Chakravarthy 1 个月前

Data Observability

Inspired by software engineering best practices, data observability is an end-to-end AI-enabled approach to data quality management that’s designed to answer the what, who, why, and how of data quality issues within a single platform. It compensates for the limitations of traditional data quality methods by leveraging detection, triage, and resolution in a single workflow across your data, systems, and code - the three places data products can break.?

The future of data quality management for AI applications and beyond

It isn’t simply the AI that needs better data quality management, though. To maximize scalability, your data quality management will also need to incorporate AI as well.

By leveraging AI into monitor creation, anomaly detection, and root-cause analysis, advanced solutions like data observability can enable hyper-scalable data quality management for real-time data streaming, RAG architectures, and other AI use-cases.?

As we move deeper into the AI future, I expect that we’ll see data teams continue to adopt solutions that unify not just tooling but teams and processes as well, leveraging automation and AI in intelligent ways to democratize data quality for the teams that own it.?

What do you think? Agree? Disagree? Let me know in the comments. Stay reliable,?

Barr

The Data Downtime Newsletter

11,560 位关注者

Sami Belhadj

5 天前

https://defi-central.net/qa.html

Raja Bolla

Eng Manager at Chime | ???? EB1-A (“Einstein Visa”) | Ex-Facebook | Ex-PayPal | Ex-Deloitte

4 周

100% Agree. Data quality often gets sidelined, but it’s something we need to embed in our engineering DNA. With a mindset that prioritizes clean, reliable data, we can build stronger systems and drive better results. cc: Hari Kiran Vuyyuru

Rama Miriyappalli

Engineering Manager,Enterprise Data Architect & Strategy Leader, Application Development, Team & Thought Leader, Big Data, Cloud, Agile, Scrum, Data Governance, Process Improvements.Expert Engineer(E2)

1 个月

Collibra is good but if you need to have scalable solution that I love amazon Deequ open source uses Spark engine to process and can build wrapper around that to simplify for business users to give queries which can translate into python or scala. And also can build home grown solutions by using Talend for orchestration which gives much more control to write delta processing etc.. most important is building DQ frameworks not to put the pressure on source systems.

Michal Nochumson

Building the Data Cloud, Strategic Advisor, Cloud Transformation

1 个月

Agree 100%. Data quality is critical and increasingly challenging as data volumes continue to grow and demand for real time data increases. Leveraging AI to improve data quality will differentiate organizations. Would love to hear about any solutions that people have used and would recommend.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Data quality management in the age of AI

Barr Moses

Co-Founder & CEO at Monte Carlo

Defining our terms—data quality monitoring, data testing, and data observability.

领英推荐

The future of data quality management for AI applications and beyond

The Data Downtime Newsletter

11,560 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

The Top 7 Problems With Data Quality

Automated Data Cleansing: Use AI to Automatically Identify and Correct Inaccurate or Duplicate Data

Harmonizing Data and AI Governance: To Do or Not To Do?

Data and AI Governance Without a Tool

Coding the future: Why data science and analytics are important in the IT industry in this modern age

Why Data Contracts are Key to AI Product Success

Automated Data Preparation: Reducing the Time Spent on Data Cleaning and Preprocessing

Data Technology Growth in the new age

AI Productive Use: Tackling Data Integrity Issues By Kumar Gaurav Gupta

Exploring the Best Auto Labeling Methods with Microsoft Purview

Defining our terms—data quality monitoring, data testing, and data observability.

领英推荐

The future of data quality management for AI applications and beyond

The Data Downtime Newsletter

11,560 位关注者

What Allan Lichtman’s failed presidential prediction teaches us about data

2024年11月12日

Every Problem is a Data Problem: Why I’m Excited for IMPACT 2024

2024年11月1日

Your data quality strategy should be automated—here’s where to start

2024年9月27日

Who owns data quality? And when?

2024年9月10日

Why Third-Party Data is Still Your Biggest Risk

2024年8月15日

Citigroup Fined $136M for Bad Data. What Can We Learn?

2024年7月31日

You’re All In On GenAI. Now What?

2023年7月13日

4 Trends Shaping Data Engineering in 2023

2023年1月4日

Why I'm Excited About IMPACT: The Data Observability Summit - And You Should Be, Too

2022年10月20日

Why 2022 Will Be the Year of Data Observability

2022年1月11日

社区洞察

其他会员也浏览了

The Top 7 Problems With Data Quality

Automated Data Cleansing: Use AI to Automatically Identify and Correct Inaccurate or Duplicate Data

Harmonizing Data and AI Governance: To Do or Not To Do?

Data and AI Governance Without a Tool

Coding the future: Why data science and analytics are important in the IT industry in this modern age

Why Data Contracts are Key to AI Product Success

Automated Data Preparation: Reducing the Time Spent on Data Cleaning and Preprocessing

Data Technology Growth in the new age

AI Productive Use: Tackling Data Integrity Issues By Kumar Gaurav Gupta

Exploring the Best Auto Labeling Methods with Microsoft Purview