How to avoid getting sacked by AI Hallucinations

How to avoid getting sacked by AI Hallucinations

I’m a massive fan of American Football (One day, I'll regale you all with tales of my misspent youth captaining the Great British Knights American Football Team) and the NFL (#GoHawks), and its that time of year where its time to think about my fantasy draft for the upcoming season, and it got me wondering, who’s going to win the Superbowl next year?

Copilot had the answer! (and a time machine, it seems)

If Copilot is able to see to peer into the future, why not a step further? What does Superbowl LX in 2026 hold?

Well, turns out it’s going to be a great year for Steelers' fans…less good for the mighty Seahawks though!

These are slightly contrived but great examples of AI Hallucination.

This phenomenon isn’t unique to Copilot, but it is something anyone using generative AI in their flow of work needs to be aware of.

Essentially, your AI assistant of choice is getting confused and makes up stuff that isn't true. It happens because the AI is good at finding patterns, but not so good at making sense of them.

It's like when you see shapes in the clouds, but they're not really there.

Our brains are able to rationalise these shapes and patterns, and understand that they are not real, whereas AI sometimes struggles to make that intuitive leap and takes a more black and white approach – That cloud looks like a dog, so it must be a dog, etc.

This can cause problems for some AI applications, such as answering questions or summarising documents. You don't want your AI to make up things that are not true, right? That's why Microsoft created a tool in Azure AI to help you catch and fix these “Ungrounded Model Outputs”.

This tool has the catchy name of “Groundedness Detection” and can help identify when your AI assistant says something that is not grounded in the document it is using.

For example, if you ask your AI to summarise a news article, it can tell you if it says something that is not in the article, or contradicts the article. This way, you can avoid misleading or inaccurate information.

AI techies and app creators can use this tool for different purposes:

  • You can test your AI application before you deploy it and see how well it sticks to the facts.
  • You can flag ungrounded statements for your internal users, so they can check them or fix them by improving the prompts or the knowledge base.
  • You can ask your AI to rewrite ungrounded statements before you show them to your end users, or use a different source document.
  • You can check the quality of synthetic data that you use to train your AI, and make sure it is grounded in reality.

How does this tool work?

Instead of using a generic AI model, this tool uses a special AI model that is trained to do a specific task called Natural Language Inference (NLI). This task is about evaluating whether a statement is true, false, or neutral based on a source document.

For example, if the document says "The sky is blue", the statement "The sky is green" is false, the statement "The sky is clear" is neutral, and the statement "The sky is blue" is true. This tool uses this logic to detect ungrounded statements in your AI output.

Remember the key principle of Copilot is exactly that – it is a Copilot, not an Autopilot, so double check its outputs and its sources

…and here’s hoping the Seahawks can change their fate and trounce the Steelers in Super Bowl LX!


Dan Coleby

The IT Strategy Coach

7 个月

Great article Robert Smith. I love your re-branding of hallucinations as crystal ball predictions. Not sure why you are so bothered about the Super Bowl though. What are the Euro millions numbers going to be on Friday? ??

Aashi Mahajan

Senior Associate - Sales at Ignatiuz

7 个月

Great insights Robert Smith! The future of AI sounds fascinating, and your perspective adds an exciting twist to it. Looking forward to more enlightening insights from you.

要查看或添加评论,请登录

Robert Smith的更多文章

社区洞察

其他会员也浏览了