Make your AI multi-modal ML trustworthy
Every study indicates that the most significant challenge in implementing AI models within enterprises stems from the fact that AI-powered systems lack awareness of trustworthiness. Consider this: when ChatGPT provides an answer, wouldn’t it be helpful if any hallucinatory portions were highlighted in red on the screen? Similarly, imagine an autonomous driving system encountering an unprecedented situation through computer vision during training—shouldn’t it trigger an emergency stop? And when a banking AI denies a loan to a customer, it should display its confidence level and explain the decision. This concept is known as explainability.
One of the primary causes of these issues lies in training data flaws. But what if, during model training, AI took the lead, with humans assisting when necessary? Imagine a collaborative approach where AI and human intervention complement each other.
Interestingly, a friend introduced me to a technology that addresses precisely these challenges. You can explore and try it for free! Visit the CAPSA product on themisai.io website.
On the website, you can explore a comprehensive demo designed for engineers. It illustrates how simple it is to address these challenges using the CAPSA library, which seamlessly integrates with any multimodal ML model—whether generative or not. Remarkably, this groundbreaking software originated in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) , under the guidance of CSAIL director Daniela Rus .
The core concept behind solving this intricate problem lies in allowing CAPSA to analyze your model and replace each neuron with a specialized one. And the best part? Achieving this transformation requires just one line of code:
model = capsa_torch.wrapper(_model).
These neurons serve two critical roles in enhancing existing models:
Main Advantages of CAPSA (according to the CAPSA site):
In light of the new AI regulations emerging worldwide (including the recent EU legislation), incorporating solutions like CAPSA becomes essential for responsible AI deployment.
And to illustrate CAPSA’s capabilities, let me share three intriguing use cases. I’ve extracted images from the CAPSA YouTube demo on their site: CAPSA Demo.
Use Case 1 - Handwriting Recognition (Text Models)
This use case utilized the MNIST benchmark database for handwriting recognition. An ML model was used for real test data, resulting in an accuracy of 65%. (out of the box)
The text shows for each character the accuracy indicator. With red you will see the errorrs, i.e. the system didn't do well.
After inserting the CAPSA Wrapper, all the errors are highlighted:
After retraining with the wrapped model, a result of 90% Accuracy is obtained.
Use Case 2: Pixel depth recognition (Image Models)
In autonomous driving, one of the core challenges is accurately determining the pixel depth of all objects within an image. This task is crucial because, during driving, the system must avoid collisions with nearby obstacles.
For instance, consider a scenario where a bus or a deer suddenly appears in an image. If the autonomous driving system was not adequately trained on similar images or if manual labeling was incomplete, confusion may arise, potentially leading to accidents. CAPSA addresses this issue effectively. In the images below, CAPSA highlights problematic areas by coloring pixels in red:
CAPSA’s ability to identify critical regions ensures safer and more reliable autonomous driving systems.
Use Case 3 - Generative AI responses - hallucination detection
The third use case involves highlighting hallucinated responses in Generative AI Language Models (LLMs).
Below, you’ll find examples where CAPSA’s trustworthiness indicators color the hallucinated text in red. These questions are part of a benchmark set used for evaluating LLMs.
Conclusion
We are on the path toward being able to enforce trustworthiness in AI models through human-in-the-loop AI. CAPSA represents both a vision for the future and an existing reality. You can explore this remarkable technology by visiting ThemisAI’s contact page.
AI Adoption Trainer for B2B / B2G, Founder@Loftrek: Ethical AI Urban Products Distribution Company; Founder@Hotel Marketing Solutions; Values: Integrity, Innovation, Impact
8 个月Very interesting!
Vice President of Solution Engineering @ Flock Safety
8 个月Thanks for sharing, George Roth! I fully believe that the more transparency on the confidence levels of the model helps us make better decisions of how/when to have human in the loop interjection. Sounds like a great technology!
Director at Hanabi Technologies
8 个月Hey George Roth You should definitely try Hana. She's more than just an ordinary AI bot—she's an assistant team member who can customize everything for you and function just like a real team member or an assistant. Check out this video to know more about Hana: https://youtu.be/KdUQsuM2XI4?feature=shared