登录查看更多内容

Chatbot Basics - Concepts, Hallucinations, and RAG

Doug Ware

Azure AI Services and M365 Development MVP | Elumenotion | System builder specializing in integration of AI and applications

发布日期: 2023年12月4日

The following is from a section of an article I started on my blog last week currently titled Building Good Chatbots Part One, No-Code with Microsoft Copilot Studio and Azure AI Studio. The original article is very long already and I am working on more. By the time I am done it will be an e-book! Based on feedback from my good friend Jake Attis I decided to publish smaller excerpts here on LinkedIn.

This part serves as an introduction to the subject of chatbots generally starting with important background information and terminology about models, chatbots, prompts, hallucinations, fine-tuning, retrieval augmented generation and context windows. By the end of it I hope you will understand

The difference between a language model such as GPT-4 and a chatbot such as ChatGPT
What the term hallucination means in the context of a model versus in the context of a chatbot
What retrieval augmented generation is and why chatbots use it
What prompt engineering means and its importance in grounding a chatbots responses
The importance and meaning of a model's context window and its length

Models versus copilots and chatbots

The terms model and chatbot are sometimes used interchangeably, but they aren’t the same thing. A chatbot is a system that usually includes one or models to do its work. Aside from models, a typical chatbot system contains a user interface, services and databases. Considering this, Copilot Studio is an appealing option for people who aren’t developers or data scientists to unlock the power of large language models for useful applications. In the rest of this article copilot and chatbot are synonymous but model specifically and only means model. Examples of chatbots include:

ChatGPT
Microsoft Copilot
Google Bard
Things you make with Microsoft Copilot Studio

Examples of models include:

GPT-35-Turbo
GPT-4
Meta Llama 2
Falcon-180b

This is an important distinction which will become clearer as we go.

Chatting with your data

You may recall that people were very impressed when OpenAI released ChatGPT on November 30, 2023 ?. Collectively we spent the next several months trying to understand its use, dangers, and limitations. One of these limitations is the tendency to tell convincing falsehoods or what we call hallucinations. A hallucination is when a chatbots generates incorrect, irrelevant, or nonsensical responses. This phenomenon can be attributed to various factors:

Training Data Limitations: Language models are trained on large datasets, but these datasets might not cover all possible scenarios or questions. When faced with unfamiliar topics, the chatbot may generate irrelevant or incorrect responses.
Model Constraints: The design and complexity of the underlying model also play a role. Some models might not be sophisticated enough to handle complex or nuanced queries effectively.
Contextual Misinterpretation: Chatbots might misunderstand or lose track of the conversation context, leading to responses that don't align with the current discussion.
Overgeneralization: AI models can sometimes make broad generalizations based on their training, leading to responses that are technically correct but contextually inappropriate.
Bias in Training Data: If the training data contains biases, the chatbot might replicate these biases in its responses, leading to inaccurate or inappropriate content.

Note that some of these causes are related to the model which is trained on a dataset and others are blamed on the chatbot which keeps track of the conversation.

Simply put a model hallucinates when it doesn’t know the answer. We can improve things by including instructions in the prompt such as: “If you don’t know the answer to my question, say I don’t know instead of inventing an answer.”

Solving hallucinations with fine-tuning

Fine-tuning is a process where you take an existing model and train it on new data to make a new model. There is a large ecosystem around fine-tuning, especially for private AI applications using models like Meta Llama 2. People fine-tune models to add specialized knowledge or skills, but also to change a model’s personality and style. For some, the ultimate goal is to make models that are ever smaller and more capable to enable good AI on commodity and even local hardware. To get an idea of the scale of these efforts check out the page of Tom Jobbins, TheBloke, on Hugging Face. He provides a great service to the community by shrinking, quantizing, models to work on less expensive hardware. Currently there are over 2,500 language models on his page alone. They were created by large organizations and individual researchers and collectively have a few million downloads. Fine-tuning is a powerful approach to many problems, but it isn’t necessarily a good approach to “chat with your data” problems insofar as it is time consuming, expensive, and the result is a static model. If the facts or data change, you must repeat the process. An alternative and more common approach to solving hallucinations is with prompt engineering and retrieval augmented generation.

Prompt engineering

A prompt is a message sent to the model to generate a response which completes the prompt. Models are static, unchanging and have no memory of previous questions, answers or conversations. If you’ve heard the term prompt engineering in context of ChatGPT it refers to writing a good prompt in the ChatGPT UI, but in context of chatbot systems it refers to all the things the chatbot system is doing to build the real prompt sent to the model. When you enter your message and hit send, the chatbot system makes a new prompt that consists of your message, instructions to the system, the previous messages in the conversation, and whatever other facts or instructions the creator of the chatbot thinks is necessary to get a good response. In fact, the chatbot might even use the model to completely rewrite your question before sending the prompt to the model. Generally, all this work is hidden from you and all you see is the answer… which might be a hallucination. In this context, prompt engineering also involves managing the size of the prompts to fit the model’s context window.

领英推荐

How ChatGPT Became Possible - Rise of LLMs

Michael Spencer 2 年前

Microsoft Co-pilot vs ChatGPT: Which one is better?

ZNet Technologies Private Limited 6 个月前

ChatGPT, Bard, Gemini, Claude, Small Scale and Open…

Albert Mao 1 年前

Solving hallucinations with Retrieval augmented generation (RAG)

A RAG system is a type of chatbot that combines search, prompt engineering, and a model to ground the response in a set of facts provided on the fly. Simply put RAG works by putting the facts required to answer the question into the prompt along with the question. Here is an example prompt from the most excellent Semantic Kernel project!

Answer questions only when you know the facts or the information is provided.  
When you don't have sufficient information you reply with a list of commands to find the information needed.  
When answering multiple questions, use a bullet point list.  
Note: make sure single and double quotes are escaped using a backslash char.  
[COMMANDS AVAILABLE]  
- bing.search  
[INFORMATION PROVIDED]  
{{ $externalInformation }}  
[EXAMPLE 1]  
Question: what's the biggest lake in Italy?  
Answer: Lake Garda, also known as Lago di Garda.  
[EXAMPLE 2]  
Question: what's the biggest lake in Italy? What's the smallest positive number?  
Answer:  
* Lake Garda, also known as Lago di Garda.  
* The smallest positive number is 1.  
[EXAMPLE 3]  
Question: what's Ferrari stock price? Who is the current number one female tennis player in the world?  
Answer:  
{{ '{{' }} bing.search ""what\\'s Ferrari stock price?"" {{ '}}' }}.  
{{ '{{' }} bing.search ""Who is the current number one female tennis player in the world?"" {{ '}}' }}.  
[END OF EXAMPLES]  
[TASK]  
Question: {{ $input }}.  
Answer:

The prompt has several placeholders which the chatbot replaces with appropriate content to try to answer the question. This one…

{{ $externalInformation }}

…is replaced with whatever content is retrieved from search to augment the generation of the answer. This part…

Answer questions only when you know the facts or the information is provided.  
When you don't have sufficient information you reply with a list of commands to find the information needed.

…ensures that the answer is grounded in the things the chatbot knows and the information provided and (hopefully) prevents hallucinations.

Grounding the answer in the retrieved data can almost completely remove some kinds of hallucinations from a chatbot because you can easily respond with “I don’t know” or “Information not found” if the retrieval doesn’t find any matches for the request when it does the search. This is an equally effective way to censor the chatbot because, as the chatbot creator, you can use this to prevent the chatbot from talking about any subject that isn’t in the search index. On the other hand, even with RAG hallucinations can still occur due various reasons including:

Incorrect Retrieval: Retrieving incorrect or irrelevant information from external sources.
Context Mismatch: Retrieved information may not align well with the specific query context.
Integration Challenges: Difficulties in seamlessly combining retrieved information with the generative model's output.
Outdated or Limited Knowledge Sources: Using external sources that are not current or comprehensive.
Model Limitations: Similar limitations as standard generative models in understanding and context management.
Biases in Data Sources: External sources might contain biases, which can influence the responses.

Simply put a chatbot using RAG hallucinates when it doesn’t know the answer. The difference here is that the reason the model doesn’t know the answer is because the the information provided by the chatbot wasn’t good enough instead of the information on which the model was trained. The retrieval component of the chatbot is independent of the model and equally important! Congratulations for reading this far. We are almost ready to talk about Copilot Studio, but first a note on context management and the context window.

Context window

The context window is the length of text a model can process, i.e. the prompt, and respond to in a single request, i.e. the response measured in tokens. When the limit is exceeded, you get errors. The context window size is perhaps the single most important constraint we face when building generative AI systems and is a key differentiator between models driving both capability and cost. Consider the following from Microsoft

Open AI pricing on Azure Open AI, November 2023

For comparison purposes, the base GPT-3.5-Turbo model has a 4k context window which can hold around six pages of text (question + answer). GPT-4 offers a version with a 32k context window which can hold around forty-eight pages of text. Sounds great except that the 32k context size costs forty times as much. What’s more, if one requires capacity to run that model at scale, you must commit to spending five-figures per month – it is possible to spend over $1 per query with GPT-4 32k! At the opposite end of the spectrum are small models you can run yourself. There has been great progress in expanding the context window in this area, but there are many models with 2k context windows! Often, to get acceptable results from these small models, people will use a combination of fine-tuning and RAG.

Thanks for reading. I hope you learned something, and feedback is always appreciated. In the next article I'll apply all of this to compare and contrast what you can do in Microsoft Copilot Studio and Azure AI Studio.

P.S. If you need help with AI, give me a shout on LinkedIn or send me an email!

--Doug Ware December 1, 2023

[email protected]

要查看或添加评论，请登录

Doug Ware的更多文章

Tool-based Agent Pattern

2025年1月6日

Tool-based Agent Pattern

I ended my last article, Retrieval Augmented Generation is an Anti-pattern by saying, ”Cheaper and better models with…

9 条评论
Retrieval Augmented Generation is an Anti-pattern

2024年12月29日

Retrieval Augmented Generation is an Anti-pattern

In 2025, don’t start your AI journey solving problems from 2022 It’s hard for me to believe that I started my AI/ML…

12 条评论
'Reasoning Tokens' and swarm-based agent applications

2024年9月30日

'Reasoning Tokens' and swarm-based agent applications

Today is an exciting time to be a product builder and AI applications consultant. Two of the biggest obstacles to…

2 条评论
AntArmy and AntRunner - About Ants

2024年9月26日

AntArmy and AntRunner - About Ants

Two weeks ago, I started the process of getting AntArmy released to the first group of testers and now the uses table…
Code Interpreter Python Package Reference: July 4, 2024

2024年7月4日

Code Interpreter Python Package Reference: July 4, 2024

I am working on an article about creating visualizations with ChatGPT. A few of the techniques involve the use of Code…

1 条评论
Magic? GPT-4o versus Azure Document Intelligence and Azure Computer Vision OCR

2024年5月30日

Magic? GPT-4o versus Azure Document Intelligence and Azure Computer Vision OCR

The original article is on my blog here: GPT-4o versus Azure Document Intelligence and Azure Computer Vision OCR…

5 条评论
Comparing Copilot Studio and Azure AI Studio for No-code RAG Chatbots

2023年12月7日

Comparing Copilot Studio and Azure AI Studio for No-code RAG Chatbots

In my previous article I covered the important basics that I think you should understand to get the most out of this…

20 条评论
Coming to Grips with AI in Software Dev - The Weighted Randoms Sample

2023年4月23日

Coming to Grips with AI in Software Dev - The Weighted Randoms Sample

Today I wrote the intro article for the Weighted Randoms project. You can find it here: Weight Randoms (elumenotion.
My Current (random) Opinions on AI and Software Development

2023年4月21日

My Current (random) Opinions on AI and Software Development

This is a repost of from my blog. The original is here: Current Opinions (elumenotion.
Speech to Text and Chat-GPT for Writing Authentic Content Quickly

2023年4月18日

Speech to Text and Chat-GPT for Writing Authentic Content Quickly

This content was originally posted on my site here: Speech to Text and Chat-GPT for Writing Authentic Content Quickly…

2 条评论

See all articles

Chatbot Basics - Concepts, Hallucinations, and RAG

Doug Ware

Azure AI Services and M365 Development MVP | Elumenotion | System builder specializing in integration of AI and applications

Models versus copilots and chatbots

Chatting with your data

Solving hallucinations with fine-tuning

Prompt engineering

领英推荐

Solving hallucinations with Retrieval augmented generation (RAG)

Context window

Doug Ware的更多文章

社区洞察

其他会员也浏览了

ChatGPT o1 vs DeepSeek AI Chat: A Quick Comparison of Leading AI Models in 2025

A Complete Survey on LLM-based AI Chatbots

Comparing the storylines of two videos using AI

Revealed: ChatGPT Statistics, Facts & Trends, and Capabilities in 2023

Meet Google’s New Chatbot – Bard – A Comprehensive Guide

The Secret OpenAI Chatbot System Prompt

The Dynamic Duo: Power BI and ChatGPT Supercharge Your Analytics

Navigating the LLMs landscape: Choosing the Right AI Chatbot for Your Needs

Mistral AI Launches High-Performance AI Model and Chatbot to Challenge ChatGPT, Claude and Gemini!

The Battle of AI-Language Models & Safe Generative Chatbots: Google Bard-LaMDA2, OpenAI ChatGPT - GPT3, GPT4, Facebook BlenderBot3 & ErnieBOT

Models versus copilots and chatbots

Chatting with your data

Solving hallucinations with fine-tuning

Prompt engineering

领英推荐

Solving hallucinations with Retrieval augmented generation (RAG)

Context window

Doug Ware的更多文章

Tool-based Agent Pattern

Retrieval Augmented Generation is an Anti-pattern

'Reasoning Tokens' and swarm-based agent applications

AntArmy and AntRunner - About Ants

Code Interpreter Python Package Reference: July 4, 2024

Magic? GPT-4o versus Azure Document Intelligence and Azure Computer Vision OCR

Comparing Copilot Studio and Azure AI Studio for No-code RAG Chatbots

Coming to Grips with AI in Software Dev - The Weighted Randoms Sample

My Current (random) Opinions on AI and Software Development

Speech to Text and Chat-GPT for Writing Authentic Content Quickly

社区洞察

其他会员也浏览了

ChatGPT o1 vs DeepSeek AI Chat: A Quick Comparison of Leading AI Models in 2025

A Complete Survey on LLM-based AI Chatbots

Comparing the storylines of two videos using AI

Revealed: ChatGPT Statistics, Facts & Trends, and Capabilities in 2023

Meet Google’s New Chatbot – Bard – A Comprehensive Guide

The Secret OpenAI Chatbot System Prompt

The Dynamic Duo: Power BI and ChatGPT Supercharge Your Analytics

Navigating the LLMs landscape: Choosing the Right AI Chatbot for Your Needs

Mistral AI Launches High-Performance AI Model and Chatbot to Challenge ChatGPT, Claude and Gemini!

The Battle of AI-Language Models & Safe Generative Chatbots: Google Bard-LaMDA2, OpenAI ChatGPT - GPT3, GPT4, Facebook BlenderBot3 & ErnieBOT