Passing in the Labs, failing outside isn't what you want for your AI models
In an article, MIT explained why AI brings the near-perfect performance in the labs but fails in the real world settings. They quoted two main reasons for it - Data shift and Underspecification. I will add the third one, Undertesting.
What is data shift?
AI models are trained with a set of inputs but the real-world scenarios throw different set of inputs. A change in the input dataset will influence the output variable and the underlying relationship between input and output data. So, you are not getting the same results with completely different set of inputs. Whenever there is a change in the data distribution, data shift occurs which leads to model?performance degradation.?Data shift is a problem right from the get-go and there are other issues, such as model drift, concept drift, training-skew, etc., that decay the model along the way, justifying the disclaimer, "past performance is no guarantee for future results".
What is Underspecification?
It is the failure to specify enough details when we build an AI/ML model and so we end up with an incomplete model.?Technically speaking, it happens when certain features are omitted in the underlying representations, ending up with incomplete requirements for the model. In this case, even the same model with differing starting points could produce vastly different predictions.?Requirements completeness is very difficult to achieve in AI modeling as we expect the system to face varying situations to evolve. This leads to costly maintenance of the AI/ML model, otherwise model decay will be faster, and the product will fail more often in the real world scenarios.
领英推荐
What about Undertesting?
Testing AI products require unique tools, techniques, and the knowledge. Testing and training are sometimes interchangeable where we let the systems learn the 'right' and 'wrong'. If the testing does not cover the acceptable range of data shifts, minimum acceptable performance levels and the gradual/sudden decay of the models, then it is undertesting.
So, what do we do?
Below are some of my suggestions. Please add yours.
Learning from one's mistakes is an effective human learning technique where the learners focus more on the topics where mistakes were made, to deepen their understanding.?AI/ML logic's learning will also follow the same technique. If your systems are not mission critical, mistakes are tolerable, else, a single mistake can lead to an irreparable loss.
A single error may be good enough for consumers to lose their trust in the AI. A diligent approach to address data shift, underspecification and undertesting can help AI systems to pass everywhere and every time.?Let us make it happen.
Artificial Intelligence | Program Management | Capture Management | Business Development
1 年Thanks for your insights on undertesting, Anbu Ganapathi Muppidathi!
Vice President - Technology at Pyroferus
1 年Great Anubu
Product & Service Delivery /Business Development /AI-ML Practitioner
1 年Nice post Anbu. Scalability is also one of the criteria to be considered while building the ML model as the ML model has to be retrained/ retested frequently with new users/new categories that has large amount of data to be processed.
Python-CrewAI |? Certified TOSCA Automation Specialist ? AS1 ? AS2 ? AE1 ? AS(SAP) |Make Automation certified RPA engineer | Automation Engineer at Qualitest | Web Developer ? Founder at Social Helper Charitable Trust.
1 年Yes under-testing will fail the AI product In some rare cases it may cause a dangerous situation.