The Secret to Creating RAG Chatbots that Actually Work

Marcos Santiago Soto

Founder @ AI Synergy, TrainingTorch | I help growing businesses automate processes, save time & drive success w/ custom AI solutions | 2+ years developing AI solutions

发布日期: 2024年7月11日

From my experience building AI apps, there are mainly just two bottlenecks holding people back from creating outstanding RAG (Retrieval-Augmented Generation) chatbots.

So…

What are they and, most importantly, how do we fix them?

Let me save you the suspense and give you the answers right away — no need to read through 3 quick plugs of a paid newsletter and a NordVPN ad.

Although the best tips on ways to improve RAG models are near the end ;)

The two key factors, as you may already know, are:

Knowledgebase
Prompts

That’s it.

Getting the overarching flow of the app isn’t too difficult. A simple workflow can do the job just fine. But the real magic happens with a carefully crafted prompt and a bulletproof knowledgebase.

How do we actually make sure they aren’t limiting our app?

We’ll dive into both, focusing especially on creating the best possible knowledgebase since resources on this specific topic are surprisingly scarce.

Stick with me here, and by the end of this article, you’ll have actionable insights to elevate your RAG chatbots to the next level. Let’s get started!

Choosing the right platform

First off, lets get the basics out of the way.

Choosing the correct platform to host your database is crucial. So, what are our options?

Qdrant
Pinecone

If you’re feeling fancy, you might choose to use OpenAI’s Assistants API. I’ve been hearing some good things about it recently, although I haven’t personally tested it since the Assistants v1 API release.

While the Assistants API is a bit of a black box and handles some problems on its own, it means we can’t optimize those parts ourselves.

Personally, I prefer Qdrant, but feel free to experiment with all three options.

The importance of the query

You know what I like more than materialistic things? KNOWLEDGE - Tai Lopez, your favorite internet guru.

As you may know, a Knowledge Base (KB) works as follows:

We use a search query → To retrieve the ‘most similar vectors’ to said query.

So, vectors are retrieved based on their similarity to the query given.

And if the search query used is wrong, you will never retrieve the right documents, no matter how good the knowledgebase is.

This is specially important to take into account with conversational models since, oftentimes, the search query is not contained (at least fully) within the last user message.

For example:

User: What is the price of sneakers ABC?
Bot: The price of sneakers ABC is $19.99
User: and of sneakers XYZ?

Notice how the user doesn’t explicitly ask for the price? It’s a continuation of the conversation.

And the LLM behind is usually fully aware of what the user is asking for.

But…

It may not be getting the context it needs.

You can’t (or at least shouldn’t) use the last user message as your KB search query, because it doesn’t contain the full context of the question.

Yet that’s exactly what lots of developers do.

But, this way, it’s a matter of luck if you get the right context the LLM needs.

And we need to rely on luck as little as possible.

The most straightforward way of tackling this is to:

Set up a quick API call to GPT-3.5 Turbo (or equivalents), with few shot prompting, to extract the right search query.

Here’s an example prompt for a shopping assistant:

A whole bunch of vectors

Now let’s look into how to make sure we have the right knowledge stored on our database.

Because having the right KB isn’t just about getting a random PDF, splitting it up into chunks, and uploading it to your database.

There’s a bit more to it than that.

Chunking (breaking down information into smaller, more manageable pieces) is especially important, vectors should be of similar size within a KB (otherwise we will retrieve less relevant vectors via similarity scores).

Chunks shouldn’t be too big or too small. The ideal size can vary by use case, so experimenting is key here. Generally, 256 tokens per vector should get the job done.

Furthermore, since we already know that ‘Vectors are retrieved based on their similarity to the query given’, we need to consider two other things:

Overarching concepts (e.g., give me a list of all available dresses)
Specific concepts (e.g., what is dress XYZ made of?)

Users may ask for any of the two, and your knowledgebase needs to be prepared to handle both.

The issue that tends to pop up here is the following:

User makes generic query.
App retrieves a limited number of vectors containing specific topics.
Topics contained in the vectors retrieved are limited.
LLM does NOT have enough info to give a good answer.
User gets bad answer which doesn’t fully solve his question.

And it can happen the other way around too, with our KB being too generic.

An easy way to fix this is to think of the overarching concepts present on our KB (or go over past question dealing with this issue), and then ask ChatGPT to create a document to upload to our KB summarizing or getting an overarching view of X concept.

If you’ve got a bit more free time (or if you’ve got a lot of vectors to deal with), you can do what I did:

Make a custom-coded app on python to automatically identify overarching topics, then send relevant context and a good prompt to an LLM and, finally, watch it create general vectors on auto-pilot for thousands of different vectors.

Here is an overview of the workflow:

Divide Vectors into Topics: Organize vectors into distinct topics based on their content, ensuring they are neither too broad nor too narrow.
Extract Overarching Concepts: Send topic vectors to GPT-4o with a detailed prompt to identify overarching concepts. Retrieve and list these broader concepts.
Display Concepts for Selection: Show the concepts in an indexed menu. Select the relevant concepts for creating new, general vectors.
Generate Summaries for Selected Concepts: For each selected concept, prepare a prompt that includes the concept and related vectors from the topic, EXCLUDING all other selected concepts to avoid overlap. Send these prompts to GPT-4o to generate comprehensive summaries for each concept.
Upload the new vectors to your KB: Upload the new summary vectors, tagging them appropriately for easy retrieval.

It’s a bit complex, but hopefully I explained myself well enough.

Testing, testing and more testing…

To refine your knowledgebase (KB) and ensure it delivers accurate responses, continuous testing is incredibly crucial.

Therefore, you should regularly analyze your bot’s message history to identify unsatisfactory responses.

When you find them, determine the root of the problem:

Deqode 1 年前

OpenAI API 4.0 General Availability: Choosing Between…

黄铭莨 1 年前

Chatbot using Amazon Lex - Reserve a table

Kunal Gaba 6 年前

Is it the prompt?
Are the wrong vectors being retrieved?
Does the KB need more vectors?
Is the search query incorrect?

Once you find the issue, fix it and move on to the next.

Since it’s quite tedious, I developed a program specifically for this purpose. Here’s what it does:

Add new vectors: - Easily integrate new data into your KB to keep it up-to-date.
Search Sample Queries: - Run test queries and see which vectors are retrieved first based on their similarity scores.
View, Edit, or Remove Vectors Instantly: - Instantly access and modify the vectors that appear from your search query. This feature is especially useful for fine-tuning your KB by directly addressing inaccuracies.

This program has been a game changer for me, and it’s relatively quick to code (around 250 lines).

Using this program, you can quickly identify and fix issues within your KB. If a query returns an unsatisfactory response, you can:

Identify the Problem: Understand why the wrong vectors were retrieved.
Make Adjustments: Edit or remove problematic vectors and add new ones as needed.
Improve Response Accuracy: Continuously refine your KB to improve the quality of responses.

For the shake of keeping this article as brief as possible, I won’t go too much deeper into this tool.

But feel free to shoot me a DM on LinkedIn if you want me to send over the code, or if you need help setting it up :)

Yes mom, I’m an engineer, a prompt engineer.

Okay, but what about the prompt? Isn’t it SOO important as well?

Absolutely, but there are already a ton of resources out there on this topic. I’ll link some of my favorite ones here:

We’ll still do a brief overview though, so let’s dive in!

The Essentials

For a good prompt, we need to consider TWO main things:

Contents
Placement

People usually overlook the latter, and it can be incredibly important, especially for smaller and cheaper models such as GPT-3.5 Turbo.

Some overview of what I found works best for GPT models:

GPT-4o:

System ? Instructions
Assistant ? Context & Instruction reminders (#Notes)

GPT-3.5 Turbo:

User ? Instructions & Context

You can even mess around with it a bit. For example, write the Assistant prompt for GPT-4o in the first person:

“Here is some context that may be useful when crafting my response… I must be brief and concise…”

Or use the ‘Assistant’ role to give the instructions to GPT-3.5, to then send queries as the ‘User’.

Key Elements for a Good Prompt

In most cases, it’s worth including:

Markdown formatting
Role-playing
Chain-of-thought
Few-shot prompting
Emotional prompting
Some final notes (# Notes)

Oh and remember to keep the prompt positive; negative instructions tend not to work as well.

If you wish, you can delve deeper into these elements on the resources provided.

Sample Prompt Structure for a RAG model with GPT-4o

System Prompt:

# Role
You are the best {role} specialized in {topic} with a knack for {specialization}. The reputation of our company, to which you belong, rests entirely in your hands and your ability to correctly answer user questions. Your role is essential in helping our users with {what it's doing for your users}, something EXTREMELY important for them.

# Task
{Enter task to achieve}. Steps to follow:

1. {First step}
2. {Second step}
3. {Third step...}

# Specifics
- Giving a correct answer is very important for our business because {reasons}
- Answers must be clear, concise and friendly.
{Enter further requirements}

# Examples
### Ex. 1
Q: {Sample query}
A: {Desired reply}

### Ex. 2
Q: {Sample query}
A: {Desired reply}

### Ex. 3
Q: {Sample query}
A: {Desired reply}

Assistant Prompt:

Context that may be relevant for my answer: [{context}].

Notes to keep in mind:

- I must answer the user's question correctly, truthfully, precisely, and concisely.
- I include only relevant and precise information in the answers, focusing on the content WITHOUT EVER ADDING closing phrases, safety warnings.
{some more notes}

Here’s what the overall structure would look like:

As you can see, the conversation follows the structure of the examples given. We write ‘Q: {user query}’ and then just ‘A:’ to prompt the LLM to generate a response that mirrors the examples provided.

A small side note:

In this example, there’s just one ‘Q:’ and one ‘A:’, since I’ve found that chatbots often work best when we ‘normalize’ the user’s query and avoid including previous bot replies.

Here’s how it works:

Collect Recent Messages: Take the last 2–3 messages from the user and the last 1–2 messages from the chatbot.
Generate a Normalized Query: Use a cheaper LLM to create a ‘normalized’ user query that includes the actual question, considering the conversation history.
Focus on the Current Context: Ensure this normalized query contains only the essential context without including past bot responses.

By normalizing the query this way, the chatbot sticks better to the provided format and avoids being influenced by previous responses. Plus, now it only needs to focus on the current user query, not the entire conversation history. This makes the output more predictable and consistent.

Handling LLM Quirks

LLMs are smart, but also very dumb.

Sometimes it’s easier to hard-code something in, rather than getting an LLM to change something in their output.

For instance, in the sample prompt given, the LLM may include “A: ” in its response.

Yet convincing the model to NOT include “A: “ on the reply is unexpectedly difficult.

In this case, two lines of code would save us a lot of trouble:

# Remove "A:" from the beginning of the message if it exists
if response_message.startswith("A:"):
   response_message = response_message[2:].strip()

The same goes for whenever you want to change how an LLM formats their replies.

For instance, all GPT models have it engrained in them that they MUST use Markdown formatting.

And trying to change this is like swimming upstream

Let’s say I’d like to implement my app on WhatsApp.

The messaging platform’s formatting is not exactly the same as Markdown.

For instance, they use a single asterisk (*) instead of two (**).

Yet another easy task for us with a bit of Python code, but almost impossible to achieve through prompting:

# Replace any double asterisks with a single asterisk in the response
try:
   response = response.replace('**', '*')
except:
     print('Error while replacing ** with * in the response')
     pass

And… That’s a wrap!

I really appreciate you reading this far into the article.

Hope it was helpful!

If so, please let me know by connecting on LinkedIn.

As an Internet Connoisseur, I can always appreciate some cool internet points, so any likes are welcome.

Finally, if you are interested in learning how to properly implement AI into your business, I’m currently helping some businesses that qualify for free. Send me a DM on LinkedIn if you’re interested.

Hope you have an amazing rest of your day,

-Marcos

要查看或添加评论，请登录

Marcos Santiago Soto的更多文章

How to Easily Implement Images into your RAG Knowledgebase

2024年8月19日

How to Easily Implement Images into your RAG Knowledgebase

Use this plug-n-play Python script to easily transcribe and implement thousands of images into your chatbot’s…
How to Build Affordable & Scalable AI Chatbots on WhatsApp

2024年6月25日

How to Build Affordable & Scalable AI Chatbots on WhatsApp

The AI space has been inundated with a plethora of new no-code tools. While some of these tools have proven useful…

3 条评论

The Secret to Creating RAG Chatbots that Actually Work

Marcos Santiago Soto

Founder @ AI Synergy, TrainingTorch | I help growing businesses automate processes, save time & drive success w/ custom AI solutions | 2+ years developing AI solutions

Choosing the right platform

The importance of the query

A whole bunch of vectors

Testing, testing and more testing…

领英推荐

Yes mom, I’m an engineer, a prompt engineer.

The Essentials

Key Elements for a Good Prompt

Sample Prompt Structure for a RAG model with GPT-4o

Handling LLM Quirks

And… That’s a wrap!

Marcos Santiago Soto的更多文章

社区洞察

其他会员也浏览了

8 Trends Spurring the shift to Chatbots

Botanalytics AI Weekly Digest April 17, 2023

Bot Services Market Shaping from Growth to Value: Microsoft, IBM, Facebook, Google

Conversational Bots – The Third User Interface?

Understanding the Facebook and Microsoft Chatbot Revolution

How the Revolutionary Chatbot is Changing the Future

Got Data? Build a Better Bot with Analytics

Revolutionizing User Engagement: Crafting AI Assistants with OpenAI's Cutting-Edge API

Microsoft's CoPilot AI: Revolutionizing iOS Conversations

Microsoft’s Bing chatbot offers some puzzling and inaccurate responses

Choosing the right platform

The importance of the query

A whole bunch of vectors

Testing, testing and more testing…

领英推荐

Yes mom, I’m an engineer, a prompt engineer.

The Essentials

Key Elements for a Good Prompt

Sample Prompt Structure for a RAG model with GPT-4o

Handling LLM Quirks

And… That’s a wrap!

Marcos Santiago Soto的更多文章

How to Easily Implement Images into your RAG Knowledgebase

How to Build Affordable & Scalable AI Chatbots on WhatsApp

社区洞察

其他会员也浏览了

8 Trends Spurring the shift to Chatbots

Botanalytics AI Weekly Digest April 17, 2023

Bot Services Market Shaping from Growth to Value: Microsoft, IBM, Facebook, Google

Conversational Bots – The Third User Interface?

Understanding the Facebook and Microsoft Chatbot Revolution

How the Revolutionary Chatbot is Changing the Future

Got Data? Build a Better Bot with Analytics

Revolutionizing User Engagement: Crafting AI Assistants with OpenAI's Cutting-Edge API

Microsoft's CoPilot AI: Revolutionizing iOS Conversations

Microsoft’s Bing chatbot offers some puzzling and inaccurate responses