When AI lies: Detecting hallucinations in Gen AI

When AI lies: Detecting hallucinations in Gen AI

Have you ever asked generative AI a straightforward question, only to receive a wildly inaccurate or even downright bizarre answer?

Maybe you requested a text summary, but the AI confidently fabricated details that weren't in the original text. Or maybe you asked for technical advice, only to receive instructions that were completely wrong.

These errors—known as "hallucinations"—happen when AI generates information that isn't grounded in its training data.

Hallucinations aren't always bad. In creative tasks, they can be a feature, not a bug—improvising fictional characters or dreaming up imaginative stories. But in high-stakes areas like healthcare, finance, or legal services, hallucinations can be dangerous—sometimes even life-threatening.

As Generative AI becomes more embedded in application development, it's crucial to understand why hallucinations happen—and how to manage them effective.

Let's dive in:

  • What makes LLMs hallucinate in the first place
  • Why detecting and controlling hallucinations matters
  • How AWS tools can help you detect and manage AI hallucinations

What makes LLMs hallucinate?

To understand hallucinations, it's important to first understand how large language models (LLMs) generate content. Let's use an example: text generation.

LLMs predict the next word in a sentence based on patterns from their training data. LLMs are trained on massive amounts of text data—such as books, websites, and articles—to learn patterns, grammar, and word relationships.

Think of it like completing a sentence in a conversation. For example, if someone asks you to complete the sentence, “The cat is ...”. You might automatically think of words like “sleeping” or “running.”

LLMs work similarly. They are very good at predicting what comes next based on patterns in the data they’ve seen before. This allows them to generate anything from essays and poems to code and answers to questions.

But, unlike you, they have no actual understanding—just pattern recognition. That's why they can sometimes guess convincingly, but incorrectly.

How to control hallucinations

One effective way to manage hallucinations is by fine-tuning the temperature parameter when generating content. The temperature parameter influences the randomness of an AI model’s prediction. It determines how the model weighs probabilities when selecting the next word.

  • A low value of temperature produces deterministic and focused outputs by narrowing the models’ choices to the most likely options.
  • Similarly, a high temperature value encourages the model to be more creative and increases randomness. This raises the risk of hallucinations in the output.

Through its Amazon Bedrock service, AWS has introduced tools that make it easy to build and scale generative AI applications using foundation models from providers like Anthropic and Cohere.

With Bedrock, you can easily configure parameters—including?temperature—to fine-tune model behavior and control output quality, tailoring responses to your application's needs.

To see this in action, let’s look at Bedrock’s response to the same prompt, with different values of temperature in the slides below:


Slide1

  • Slide 1 shows a snapshot of settings using the same Titan Text G1 Express model with different temperature values. On the left side, we set the temperature value to 0; on the right, we use 0.8. We then provide the same prompt as input.


Slide2

  • Slide 2 shows the response under different temperature values. The model with the temperature parameter set to 0 provides a concise response, whereas setting the temperature to 0.8 provides a detailed response to the prompt.

Adjusting the temperature?with?Amazon Bedrock?lets you?balance creativity and accuracy—making it easier to generate outputs suited to your application's needs, whether you’re building a chatbot or generating marketing copy.

Factual hallucination is misinformation

But what if LLMs generate content that is presented as factual, but is false or inaccurate?

These errors are commonly referred to as "factual hallucinations." A factual hallucination happens because the model predicts based on patterns in its training data without verifying the truth.

Let’s look at an example where these factual errors are dangerous and can easily provide incorrect information.

Think of an organization that wants to use a customer support chatbot to enhance its workflow. Would you trust this chatbot if it occasionally responded with made-up information? The answer would be no.

However, we can implement a system that ensures the chatbot provides correct information and prevents hallucinations in its responses.

How AWS helps safeguard Generative AI outputs

Amazon Bedrock Guardrails is a service designed to enhance generative AI applications’ reliability, accuracy, and compliance by providing safeguards. It provides the following safeguards for your generative AI applications:

  • Adds content filters to detect harmful input and model responses.
  • Adds denied topics to block user input or model responses associated with the topic.
  • Filters words and phrases in user inputs and model responses.
  • Filters sensitive information from the model output.
  • Contextual grounding check using retrieval-augmented generation (RAG) to ensure that LLM outputs are accurate and relevant to external data sources.

At the re:Invent 2024 event, AWS announced the launch of?automated reasoning checks?in Amazon Bedrock Guardrails. These checks allow organizations to mitigate hallucinations by automatically evaluating the outputs of generative AI models against predefined rules and constraints. Compared to RAG, which relies on fetching data from trusted sources, automated reasoning checks ensure that outputs strictly adhere to predefined policies, domain rules, or regulatory guidelines.

How do reasoning checks work?

Reasoning is the cognitive process of concluding, solving problems, or making decisions based on evidence, logic, or information. It involves analyzing and evaluating information, identifying relationships, and applying critical thinking to drive insights or solutions.

Reasoning detects hallucinations using a mathematical framework to validate the information generated by the model. Reasoning techniques analyze the outputs of LLMs against established rules or logical frameworks to identify inconsistencies with the following objectives:

  • Accuracy: Identifies the factual information in the LLM output.
  • Soundness: Ensures that the AI does not assert false claims as true. A sound reasoning process will only affirm statements that are verified as accurate, preventing the model from generating misleading or incorrect information.
  • Transparency: Creates a transparent and auditable log of how conclusions were reached. This traceability allows users to understand the basis for the AI’s claims, making identifying and correcting hallucinations easier.


Objectives of reasoning checks

Reasoning capabilities can cross-check the generated content with known facts or logical deductions, effectively filtering out hallucinations before they reach the user.

Automated reasoning checks in AWS

To utilize automated checks in AWS, you can follow the steps below:

  1. Create your policy document: A policy document defines the domain-specific knowledge or the organization rules you want AI to adhere to. These can be your HR policies or operational manuals. The AWS encodes these policies into a structural, mathematical format. These policies are then used to verify the output generated by the AI.
  2. Create a guardrail and configure automated reasoning checks: Amazon Bedrock Guardrails allows you to use the policy document in implementing application-specific safeguards.
  3. Test automated reasoning checks in the playground: As a domain expert, you can test their effectiveness before deploying them in your production environment.


Automated reasoning checks in AWS


Use cases of automated reasoning checks

Automated reasoning checks are particularly valuable for use cases requiring factual correctness, such as:

  • Ensuring compliance with legal documentation
  • Verifying medical advice against established guidelines
  • Maintain enterprise policies and standard operating procedures

Let's see what all this looks like in action.

Example: Chatbot for Loan Applications

Let’s look at an example of deploying a customer support chatbot for a financial services company. The chatbot decides the applicant’s loan eligibility based on predefined rules.

Step 1: Create an automated reasoning policy—Rules

You can start by uploading the policy document to Amazon Bedrock Guardrails. Amazon Bedrock will analyze the document and generate an automated reasoning policy.

Automated reasoning policy consists of variables defined by a name, type, and description, along with logical rules that operate on these variables.

For our example, let’s suppose we have the following variables with their descriptions:

  • applicant_age: The age limit of the loan applicant
  • credit_score: The credit score of the applicant.
  • monthly_income: The minimum income threshold for the loan type.
  • existing_debt: The applicant’s debt to calculate the debt-to-income ratio (DTI).
  • employment_status: How long has the applicant been involved in the current employment or self-employed with a stable income source?

The policy also defines logical rules using the policy document to represent formal logic as follows:

  • The loan application is only admissible if the applicant is between 21 and 65.
  • The credit_score needs to be a function of the applicant’s age, with a minimum requirement of 700.
  • The debt-to-income ratio (DTI) needs to be at or below 36%.
  • The employee should have a stable job with a minimum of 1 year running in the current job.

The chatbot will initially assess the load application’s eligibility based on these rules. As a domain expert, you can refine the model by updating these rules using natural language without formal logic expertise.

Step 2: Configure automatic reasoning checks in Amazon Guardrails

Once the policy is created, you can configure it within Amazon Bedrock Guardrails. You can enable the automated reasoning check and select the policy to use. The Amazon Guardrails will now apply automated reasoning to validate all the chatbot responses.

Step 3: Validate and correct LLM answers

After configuring the guardrails, you can test the automated reasoning checks to validate the policy’s effectiveness. Amazon Bedrock provides a test playground where you can simulate real-world scenarios by inputting sample prompts and reviewing the outputs.

If any output violates the reasoning policy or contains inaccuracies, the system flags the issue and suggests corrections. This feedback loop helps domain experts refine the policy or adjust the model’s behavior to ensure compliance.

Hallucinations aren’t always bad—they’re what make AI creative. But when accuracy matters (like in healthcare or finance), they can cause real harm.

That’s where?AWS automated reasoning checks?come in. Using formal verification, AWS ensures AI systems behave as intended—making it possible to build safer, more reliable AI for high-stakes use cases like legal advice, healthcare, and financial services.

要查看或添加评论,请登录

Vinay Ananth R.的更多文章