Advanced RAG with Amazon Bedrock
Recently I have been using Amazon Bedrock Knowledge Bases extensively. It really makes setting up a RAG solution very easy. It is however only a building block and if you want a really great product you have to build some additional components yourself.
First lets talk about the best feature of Knowledge Bases:
Chunking and additional metadata
Chunking data is essential. If you are adding large documents with hundreds of pages to your knowledge base then you need to split them up and return only the relevant sections to use as context for your inference. If you are returning too much context it will increase costs (models charge based on input token count) and latency. It may also harm output quality. Shorter chunks will provide a better match but may lack the context necessary to answer a question.
Bedrock Knowledge bases has a few different chunking strategies to choose from. They handle everything from splitting at semantic boundaries like paragraphs and hierarchical structures. However some document types can benefit from custom chunking. For example, any form of mark up can be used by a custom chunking approach.
You can also create your own custom chunking approach using a Lambda function. If you want to add any custom metadata then you will need to add a Lambda function. You can either handle the chunking yourself, edit an existing chunk or just add metadata. Metadata can then be used for filtering.
It is important to tune your chunking to the type of documents being ingested. Getting the wrong chunk size will affect the accuracy and response times. It will also increase the costs in both the vector storage and inference steps. The defaults supplied in Bedrock are pretty good but they may need tailored to your specific circumstances. Longer and more technical documents may need larger chunk sizes to make sure they include more context. Speech (like a chat transcript) can benefit from shorter chunks.
To determine the ideal chunking strategy and chunk size there are two things I would highly recommend:
Stop using retrieve and generate
The retrieve and generate method that is part of knowledge bases is a key way to get started and produce simple product. However, if you have more complex RAG requirements, there is far more control using just the retrieve method and then combining it with your own inference.
This allows you to do several things. Here are the top ones I have thought of:
To move beyond retrieve and generate you will need to create your own agent. Some of this can be done using Bedrock agents but for more complex workflows you may be better building your own agent.
Moving beyond retrieve and generate is pretty much a requirement for any of the following sections!
Improving search query
Bedrock knowledge bases has built in query decomposition. This is a great addition to Bedrock, but it is not very well documented in terms of what it does.
If you want to improve your knowledge base access there are a few steps you can take yourself.
Here is an example of a prompt I have used that decomposes a query and includes extra information. It outputs a number of subqueries formatted in JSON. This can then be used create multiple queries for a vector search.
领英推荐
DECOMPOSE_PROMPT = {
'system_message': """
You are an AI assistant that prepares queries that will be sent to a search component.
Your job is to reformulating user queries to improve retrieval in a RAG system.
""",
'user_instructions': f"""
Perform the following steps:
- If the query is narrow in focus, generate an additional step-back query that is more general and can help retrieve relevant background information.
- Rewrite it to be more specific, detailed, and likely to retrieve relevant information.
- Perform query decomposition. Given a user question, break it down into distinct sub questions that you need to answer in order to answer the original question.
You should produce 1-10 subqueries that, when answered together, would provide a comprehensive response to the original query.
If there are acronyms or words you are not familiar with, do not try to rephrase them.
If the query is already well formed, do not try to decompose it further.
Your output should be a JSON document with an array of subqueries
<examples>
<example1>
<user_input>Did Microsoft or Google make more money last year?</user_input>
<output>
{{
query: "Did Microsoft or Google make more money last year?",
subqueries: [
"How much profit did Microsoft make last year?",
"How much profit did Google make last year?"
]
}}
</output>
</example1>
<example2>
<user_input>What is the capital of France?</user_input>
<output>
{{
query: "What is the capital of France?",
subqueries: [
"What is the capital of France?"
]
}}
</output>
</example2>
<example3>
<user_input>What are the impacts of climate change on the environment?</user_input>
<output>
{{
query: "What are the impacts of climate change on the environment?",
subqueries: [
"What are the impacts of climate change on biodiversity and ecosystems?",
"How does climate change affect the oceans and sea levels?",
"What are the effects of climate change on agriculture?",
"What are the impacts of climate change on human health?",
"What are the impacts of climate change on weather patterns?"
]
}}
</output>
</example3>
</examples>
<user_input>{user_query}</user_input>
"""
}
If you do multiple vector searches you will create a couple of extra problems:
It is also useful to experiment with temperature. A slightly higher temperature will generally produce a more varied set of search terms.
Routing
Sometimes you will need to combine multiple knowledge bases. This is different to having a single knowledge base with different sources. You may have knowledge bases that are shared between different applications, there may be different data residency requirements, there may be different schema.
Note: A single knowledge base can have multiple data sources. Often it is better to use a single knowledge base if you can as it will have lower overhead and costs.
Whatever the reason, you can send different queries to different knowledge bases. If you have a search modification stage. You can easily add query routing path this phase.
With a lot of query routing it can be wasteful but not detrimental if a query is sent to the wrong knowledge base. Take for instance a music lyrics knowledge base and a film scripts database. If you wanted to get quotes from 'Lady Gaga' you would want to search both. Your search could be very similar for both of them. However if you had one knowledge base containing your corporate US documents and another containing your UK documents, a search about holiday entitlement should normally be directed to only the correct knowledge base (note queries that specifically asked to compare US and UK holiday entitlement would be an exception). Adding routing can be easily combined with a search improvement step.
Using my prompt above you can add a classification step to split queries between multiple knowledge bases and add a knowledge base name to the output JSON
Quality control
Once you have a result returned from the inference step, you can use additional inference step(s) to evaluate the output. This can be useful to ensure references were used, the user request was adequately fulfilled and no hallucinations were generated.
You can add several quality checks to an analysis of your inference output. This can help avoid output that does not meet your business needs. It is then possible to either re-run the inference with different parameters (like temperature or prompt content). It is also possible to run multiple inference steps and select the best results. If none of the results meet your quality requirements then you can advise the user that you were unable to answer teh question.
Here is a possible quality control prompt:
ANSWER_EVALUATION_PROMPT = {
'system_message': """
You are an analytical AI assistant able to make quality checks
on 'response' and 'user_query'""",
'user_instructions': f"""
Your tasks are to make the following quality checks on 'response'
and 'user_query'. Write your answer to each check as evidence in your output.
Let's do it step by step.
Check 1: Credibility
Does each point in the response have evidence to justify the point?
Evidence should include the sources listed in the source_material.
Check 2: Meets requirements
Does the response fulfill the requirements outlined in 'user_query'?
Check 3: Detail balance
Make an assessment on how generic the user_query is and judge
whether the response is equally as generic or detailed.
Using all the information from the checks,
give a score reflecting how well the response meets all these checks
on an integer scale between 0 and 100.
Please penalise the score if any of the checks show areas of improvement.
Please penalise the score if the response states any missing information.
<response>
{generated_response}
</response>
<user_query>
{user_query}
</user_query>
<source_material>
{source_material}
</source_material>
Analyse each sentence before writing
"""
}
Guardrails
Bedrock guard rails are great but they sometimes need to be applied intelligently. If you have a multi-step process you may not want to apply guard rails at each intermediate step. You may also want to abort future steps if you hit a guard rail.
If you are using a LLM to improve the search query as an initial step this is an important step to apply input guard rails. If they generate a hit, then you can feed back to the user at this point. It is important to detect a guard rail hit wherever they are used so you can take appropriate action.
It is important to include guard rails (especially in anything that is open to the public and not just an internal tool). By breaking up the steps and tailoring the guard rails it is possible to give the user a much better response.
Conclusion
Bedrock Knowledge Bases is a great product for building a RAG solution. There are a few features I wish it had, but is great how quickly you can get started and build a product.
Once you understand the process then it is easy to build your own products on top of Bedrock Knowledge Bases and come up with some really powerful GenAI applications.
I am still building products with Amazon Bedrock and will amend this article based on my experience.