RAGs to?Riches
Figure 1. RAGs to Riches. Generated with ChatGPT/Dall-E.

RAGs to?Riches

Retrieval Augmented Generation with Query Expansion

If you haven’t heard of retrieval augmented generation (“RAG”), it is absolutely blowing up the AI space.

In this article, I give a couple interesting ideas to boost the “retrieval” part of your system. In other words, the “search” that finds the subset of your custom, private data that is most closely related to your question. This data is then included in the prompt for the LLM to use when responding.

The usual RAG approach is to reroute the user’s query away from the LLM -by first taking a detour in the form of a search across your: documents, knowledge bases, and other custom data. You then take the most relevant text chunks from the search and instruct the LLM to use only that context to compose its answer. (See Figure 2 below. This approach can use either a traditional search or a separate AI-powered search using vector-embeddings.)

Figure 2. Process Flow for Standard LLM queries and Retrieval Augmented Generation (RAG) queries.

1) HyDE Queries

The first new and interesting tweak I have for you today is called hypothetical document embeddings, or HyDE. The idea behind HyDE is that a search will yield better results if you use an LLM to create a hypothetical, hallucinated answer… append it to the query… and submit the combined string to the search instead of just the query by itself (For more detail, see this excellent interview from Sam Partee of Redis).

This makes sense because?—?even with an answer based on the random, general knowledge which a free-flowing LLM will fabricate?—?the combined query string will more than likely gain some rich semantic information and keywords.

Here’s an example: Say a new employee asks the onboarding chatbot the question: “Who can get me a Salesforce login?”…

# Query for Standard Retrieval:
Query1 = "Who can get me a Salesforce login?"        

…but your company doesn’t even use Salesforce. Instead, your company uses Hubspot…so the search/retrieval step is not going to produce good keyword matches from your PDFs, policies, and procedures.

But if you submit this question before the search/retrieval step, and let an LLM run free on an arbitrary, made-up response, you might get a combined string like this:

# Query for HyDE-modified Retrieval:
Query2 = """Who can get me a Salesforce login?
A: To gain access to the Salesforce customer relationship 
manager (CRM) and interact with client and prospecting 
data, you can contact the Sales Operation department or
the IT department by email or phone at 555–5555."""        

Submitting the combined question+answer pair to the search/retrieval is going to use a lot more meaty keywords and context to pull from relevant documents. (On the downside, you do have to run the LLM an extra time before each RAG query, but in many instances, the accuracy-vs-latency tradeoff may be worth it.)

2) Full, AI-Powered Query Expansion

So when I heard about this, I realized this is actually a new riff on a tried and true staple of search known as query expansion. So I thought: Hey, what if you explicitly asked the LLM to perform a full query expansion? You could ask it to:

  1. Make up a best-guess answer, just as in HyDE, but then also…?
  2. Include related keywords that would optimize a search… and…
  3. You could even reference a list of company acronyms and abbreviations which could be expanded or defined if appearing in the search string as well.?

Here’s an example:

# Full, AI-Powered Query Expansion:
Query3 = """Who can get me a Salesforce login?

A: To gain access to the Salesforce customer relationship 
manager (CRM) and interact with client and prospecting 
data, you can contact the Sales Operation department or
the IT department by email or phone at 555–5555.

Related Keywords: ["sales", "marketing", "customer relationship",
 "password", "authentication"]

Acronyms: [("CRM", "customer relationship management"), 
("IT", "information technology")]
"""        

Now what’s even more interesting about using a full, AI-powered query expansion?like this—?you might be able to attain enough semantic information that you can reliably drop down to a traditional keyword-based search algorithm like BM25, which is much cheaper and faster than an AI vector-based search, while still achieving comparable system accuracy. So that’s where the payoff could lie.

If you’re experimenting with RAG architectures, or just starting to think about AI and LLMs, I’d love to hear your thoughts and questions in the comments! Thanks!


要查看或添加评论,请登录

社区洞察

其他会员也浏览了