登录查看更多内容

Passing in the Labs, failing outside isn't what you want for your AI models

Anbu Ganapathi Muppidathi

CEO at Qualitest Group

发布日期: 2023年4月17日

In an article, MIT explained why AI brings the near-perfect performance in the labs but fails in the real world settings. They quoted two main reasons for it - Data shift and Underspecification. I will add the third one, Undertesting.

What is data shift?

AI models are trained with a set of inputs but the real-world scenarios throw different set of inputs. A change in the input dataset will influence the output variable and the underlying relationship between input and output data. So, you are not getting the same results with completely different set of inputs. Whenever there is a change in the data distribution, data shift occurs which leads to model?performance degradation.?Data shift is a problem right from the get-go and there are other issues, such as model drift, concept drift, training-skew, etc., that decay the model along the way, justifying the disclaimer, "past performance is no guarantee for future results".

What is Underspecification?

It is the failure to specify enough details when we build an AI/ML model and so we end up with an incomplete model.?Technically speaking, it happens when certain features are omitted in the underlying representations, ending up with incomplete requirements for the model. In this case, even the same model with differing starting points could produce vastly different predictions.?Requirements completeness is very difficult to achieve in AI modeling as we expect the system to face varying situations to evolve. This leads to costly maintenance of the AI/ML model, otherwise model decay will be faster, and the product will fail more often in the real world scenarios.

领英推荐

What is Synthetic Data? Come On…

David Sable 11 个月前

Model o1 on Scenarios for the Employment Impact of AI…

Michael Watkins 2 个月前

The influence of AI in 2025: Trends you should not miss

Plain Concepts 2 个月前

What about Undertesting?

Testing AI products require unique tools, techniques, and the knowledge. Testing and training are sometimes interchangeable where we let the systems learn the 'right' and 'wrong'. If the testing does not cover the acceptable range of data shifts, minimum acceptable performance levels and the gradual/sudden decay of the models, then it is undertesting.

So, what do we do?

Below are some of my suggestions. Please add yours.

Improving the model through ground truth: To train or test AI/ML models, you need high-quality, high-volume ground truth dataset. Ground truth is data collected from real-world scenarios to train algorithms on contextual information such as verbal speech, natural language text, human gestures and behaviors, spatial orientation etc. Ground truth describes the validation of data by going out in the field and checking “on the ground” and provides the data that is “known” to be correct. Synthetic data that are used in the labs alone is not good enough. Training and testing the data model with ground truth makes the AI/ML logic perform better.
AI to test AI:?When the internals of a software system are known, developing tests is straightforward. In an AI/ML solution, the “interpretability” of the AI/ML logic is low, i.e., input/output mapping is the only known element and the mechanism for the underlying AI/ML function (for example, prediction) cannot be looked at or understood. Though traditional black-box testing helps to address the input/output mapping, when there is a lack of transparency, humans will have difficulty trusting the testing model.?There are unique AI techniques such as posterior predictive checks (PPC), genetic algorithms, neural networks, fuzzy logic, etc., that help validate the AI functionality. Testing should go beyond just input and output mapping. Testing should also ensure bias, fault tolerance and failproof. I wrote a white-paper with all the technicalities on this topic sometime back for you to refer, if interested.
Model monitoring for performance degradation: No model lives forever. We need to monitor the health of the data, model, interfaces, and the associated services. Measuring the model quality metric and taking actions if the model falls below acceptable standards are key. Defining standards for each of the control variables in the AI/ML model and constantly monitoring if their standard deviation is within the acceptable levels is the best way to do it. When the metric exceeds the boundaries, the model should be retrained with the new set of inputs (and assign high priority to the newer patterns) to get the model accuracy back.?It is important to align the operations, testing/training, and the design teams to react when the models reach alarming levels of performance degradation.
Periodic model maintenance: Periodic model retraining and retesting are absolute necessity, sometimes to unlearn the patterns of the model when they are no longer valid. Training the models with new ground truth data and preparing it for the new segments, users, and categories of use will help keep the model accuracy longer. Caring an AI product is more complex than managing the rest of the IT assets. Take it very seriously.?

Learning from one's mistakes is an effective human learning technique where the learners focus more on the topics where mistakes were made, to deepen their understanding.?AI/ML logic's learning will also follow the same technique. If your systems are not mission critical, mistakes are tolerable, else, a single mistake can lead to an irreparable loss.

A single error may be good enough for consumers to lose their trust in the AI. A diligent approach to address data shift, underspecification and undertesting can help AI systems to pass everywhere and every time.?Let us make it happen.

Kris Rath, PhD

Artificial Intelligence | Program Management | Capture Management | Business Development

1 年

Thanks for your insights on undertesting, Anbu Ganapathi Muppidathi!

1 次回应

Somasundar Mari

Vice President - Technology at Pyroferus

1 年

Great Anubu

1 次回应

Rashmi Bogapur

Product & Service Delivery /Business Development /AI-ML Practitioner

1 年

Nice post Anbu. Scalability is also one of the criteria to be considered while building the ML model as the ML model has to be retrained/ retested frequently with new users/new categories that has large amount of data to be processed.

1 次回应

Ajay S

Python-CrewAI |? Certified TOSCA Automation Specialist ? AS1 ? AS2 ? AE1 ? AS(SAP) |Make Automation certified RPA engineer | Automation Engineer at Qualitest | Web Developer ? Founder at Social Helper Charitable Trust.

1 年

Yes under-testing will fail the AI product In some rare cases it may cause a dangerous situation.

1 次回应

查看更多评论

要查看或添加评论，请登录

Anbu Ganapathi Muppidathi的更多文章

How open AI should be?

2025年3月15日

How open AI should be?

AI Sputnik Moment woke up the world on open-source AI models. DeepSeek has democratized access to the advanced AI…

6 条评论
"AI-First Mindset" - Is this just another rodeo?

2025年2月13日

"AI-First Mindset" - Is this just another rodeo?

Before even I welcome you into the Agentic AI era, it feels like we are already in the middle of it. Everyone that I…

8 条评论
Is Agentic Architecture the new gold standard?

2025年1月15日

Is Agentic Architecture the new gold standard?

The economic potential of AI is driven by (a) measurable business outcomes of the applied AI usecases and (b) improved…

6 条评论
When Providers Compete, Consumers Win: The Race Among the AI Platforms

2024年12月13日

When Providers Compete, Consumers Win: The Race Among the AI Platforms

Amazon has recently announced their blockbuster AI plans with six new large language models (Nova), new AI computer…

4 条评论
Is AI eating the software?

2024年10月31日

Is AI eating the software?

Marc Andreessen famously said in 2011, "Software is eating the world". Six years later, Jensen Huang (Nvidia CEO) said,…

7 条评论
Workplace Toxicity Has Irreparable Consequences

2024年9月29日

Workplace Toxicity Has Irreparable Consequences

All the debates on workplace stress, depression and employee welfare will lead to workplace toxicity that has…

13 条评论
The Parallels of Diamonds and Humans: The Power of Mentorship

2024年9月16日

The Parallels of Diamonds and Humans: The Power of Mentorship

Diamond analogy of life under pressure, personal development, coaching, value production, etc. isn’t anything new.

11 条评论
The songs my team taught me

2024年8月4日

The songs my team taught me

Happiness and well-being are central ambitions for people all over the world. Right now, happiness Index across all…

5 条评论
Shifting from “Great Resignation” to “Great Exhaustion”

2024年6月9日

Shifting from “Great Resignation” to “Great Exhaustion”

IT services business is always a people business. We can discuss all about Artificial Intelligence and…

14 条评论
Does your data create competitive advantage?

2024年5月17日

Does your data create competitive advantage?

By now, most businesses have understood the power of data. Whether to improve decision making or to drive better…

3 条评论

See all articles

Passing in the Labs, failing outside isn't what you want for your AI models

Anbu Ganapathi Muppidathi

CEO at Qualitest Group

What is data shift?

What is Underspecification?

领英推荐

What about Undertesting?

So, what do we do?

Anbu Ganapathi Muppidathi的更多文章

社区洞察

其他会员也浏览了

How AI Understands and Stores Extra Knowledge

Which AI Tool is Best? A Comparative Analysis of 11 Industry Leaders including DeepSeek R1 and Qmen 2.5

Artificial Intelligence Starts with Trusting Your Data

AI vs Reality: Did you Fall for the Hype

The Rise of Compound AI Systems

Notable Recent AI News, Articles, and Papers for Monday, July 15, 2024

LLMs and Financial Data - One Model Cannot Rule Them All

Rapid AI Insights: Edition 20

From Data to Text: The Process of AI Prompt Generation

The Good, Bad and Ugly aspects of using Synthetic Image Data for continual self- improvement of Computer Vision Models

What is data shift?

What is Underspecification?

领英推荐

What about Undertesting?

So, what do we do?

Anbu Ganapathi Muppidathi的更多文章

How open AI should be?

"AI-First Mindset" - Is this just another rodeo?

Is Agentic Architecture the new gold standard?

When Providers Compete, Consumers Win: The Race Among the AI Platforms

Is AI eating the software?

Workplace Toxicity Has Irreparable Consequences

The Parallels of Diamonds and Humans: The Power of Mentorship

The songs my team taught me

Shifting from “Great Resignation” to “Great Exhaustion”

Does your data create competitive advantage?

社区洞察

其他会员也浏览了

How AI Understands and Stores Extra Knowledge

Which AI Tool is Best? A Comparative Analysis of 11 Industry Leaders including DeepSeek R1 and Qmen 2.5

Artificial Intelligence Starts with Trusting Your Data

AI vs Reality: Did you Fall for the Hype

The Rise of Compound AI Systems

Notable Recent AI News, Articles, and Papers for Monday, July 15, 2024

LLMs and Financial Data - One Model Cannot Rule Them All

Rapid AI Insights: Edition 20

From Data to Text: The Process of AI Prompt Generation

The Good, Bad and Ugly aspects of using Synthetic Image Data for continual self- improvement of Computer Vision Models