Why AI observability in computer vision matters from day one.
Building computer vision (CV) products is fun and exciting, it’s magical when you get the first demos working and you can see the results with your own eyes. However, it’s also tedious and notoriously difficult to bring computer vision systems to production. The exciting phase happens at the beginning of any new project. More often than not, you have little data to train and test with and rely on pre-trained open-source models to make your first steps. At that point, your primary focus is probably to get a proof-of-concept (POC) going that demonstrates the performance of your CV model on a small test data set through your favorite metric (e.g. Precision, Recall, mAP, etc). That’s totally understandable and the right thing to do.
However, it’s also where it gets dangerous quickly. The real work begins after the POC; the POC’s primary purpose is to assess the amount of work involved to build a production system, guide what is needed for that and estimate the chance of success. I speak from experience when I say hitting the target metric on a test set is the easy bit. We all know that. Yet, too often an over-reliance on this one (two or three) metric(s) has led me to:
In short, even at the POC stage, we have to do a rigorous assessment of the status quo of our computer vision systems. What should that involve? It depends on the product, but at the very least should include (in addition to your target metrics) an assessment of:
These processes used to be a lot of work, but with the advent of artificial intelligence (AI) observability software, like Lakera’s MLTest , these assessments become as easy as training a new model.
Applications like MLTest can help automate testing processes and provide insights and transparency on computer vision applications.
The rewards are immediate, with insights to accelerate development and tedious processes automated, there’s a much higher chance of success in production.?
Most AI software projects are stuck in the prototyping trap or underperform in operation.
An analysis by Gartner in 2022 found that “only 53% of projects make it from artificial intelligence (AI) prototypes to production”[1 ].
Getting stuck in the prototype trap leads to more than half of AI applications never making it to production.
Why is this the case??
领英推荐
Developers don’t pay attention to model robustness.
All too often developers don’t systematically measure the robustness of a model or only do it at the end of development. It’s a widespread (mis-)belief that adding a few data augmentations (e.g. Gaussian noise, horizontal flips, etc) will fix all robustness issues. Data augmentations can fix some issues, but not all. To know which augmentations to add and what data to collect, it’s key to first understand the robustness of your system and measure that as part of your development as early as possible.
The quality of the data is insufficient.
There is never enough of the right data. Never. At the beginning of a new project, this is particularly true, so you may resort to using any open-source datasets you can get your hands on and become creative in other ways to make more training and test data. Unfortunately, the data you end up with will not be exactly representative of your use case, it will likely contain unwanted biases and possibly dangerous correlations that will lead to your model taking shortcuts. As with robustness, it’s key to measure all these issues as early as possible; it will help you collect the right data going forward and alert you of potential issues in the final model. Certain applications, like MLTest , helps to test for data robustness and where you can proactively procure data that is right for your particular application.
Business stakeholders find it difficult to understand the limitations of the product.
We’ve all been there: a product manager, CEO, CTO or similar have asked if your computer vision system will work under certain conditions. What do you say? “Ehm, probably?”. If you are unsure about the performance of your system, stakeholders one level removed from the technology will be lost, and so will be the customers they sell to. This can only lead to misunderstandings and frustration for all involved. Canceled projects and canceled contracts. Once you continuously measure your system's performance across a wide range of metrics and scenarios, you will be in a much better position to communicate the findings too.?
There are infinite tests to run and only so much time until the next meeting with your PM. You think you’ve prepared for those use-case questions until…
Measure early and reap the rewards.
Start measuring the robustness of your models, the quality of your data, and the risk of underperformance early in your development and you’ll avoid many of the mistakes we made. Try Lakera’s AI observability software for free today.?
Lakera’s MLTest equips every computer vision development team with a world-class testing infrastructure. Our product finds critical vulnerabilities and flaws in computer vision systems – automatically as part of existing development processes and before they can impact operation. We want to enable every team, small to large, to ship AI products quickly and reliably. Get in touch to schedule a demo!
[1] Gartner Identifies the Top Strategic Technology Trends for 2021, Gartner, 2020.