登录查看更多内容

Advanced RAG with Amazon Bedrock

Andrew Larssen

AWS evangelist & AWS technical lead Consultant @ PA Consulting Group

发布日期: 2024年10月14日

Recently I have been using Amazon Bedrock Knowledge Bases extensively. It really makes setting up a RAG solution very easy. It is however only a building block and if you want a really great product you have to build some additional components yourself.

First lets talk about the best feature of Knowledge Bases:

Managed document ingestion - Knowledge bases has a number of connectors to ingest documents from s3, Sharepoint, Confluence and websites. It handles the chunking, calling the embedding model to generate vectors and storing. This is a significant piece of plumbing that is done for you.
Support for document formats - If you were to build a solution from scratch you would need to convert multiple file formats. Out of the box Knowledge Bases supports text, markdown, HTML, Word, CSV Excel and PDF files.
Simple API to retrieve chunks - A single API call uses the embedding model to vectorise your input, search the vector store and return the chunks. Additionally hybrid (text + vector) search is supported. Chunks are returned with a score, unique ID and the location of the original document (along with any additional metadata you include). The search API also supports filtering based on metadata. This can be used for multiple functions like reducing the scope or to implement access controls.

Chunking and additional metadata

Chunking data is essential. If you are adding large documents with hundreds of pages to your knowledge base then you need to split them up and return only the relevant sections to use as context for your inference. If you are returning too much context it will increase costs (models charge based on input token count) and latency. It may also harm output quality. Shorter chunks will provide a better match but may lack the context necessary to answer a question.

Bedrock Knowledge bases has a few different chunking strategies to choose from. They handle everything from splitting at semantic boundaries like paragraphs and hierarchical structures. However some document types can benefit from custom chunking. For example, any form of mark up can be used by a custom chunking approach.

You can also create your own custom chunking approach using a Lambda function. If you want to add any custom metadata then you will need to add a Lambda function. You can either handle the chunking yourself, edit an existing chunk or just add metadata. Metadata can then be used for filtering.

It is important to tune your chunking to the type of documents being ingested. Getting the wrong chunk size will affect the accuracy and response times. It will also increase the costs in both the vector storage and inference steps. The defaults supplied in Bedrock are pretty good but they may need tailored to your specific circumstances. Longer and more technical documents may need larger chunk sizes to make sure they include more context. Speech (like a chat transcript) can benefit from shorter chunks.

To determine the ideal chunking strategy and chunk size there are two things I would highly recommend:

Know your data - If you understand the format of your documents and how they will be queried then you can use this to inform your chunking strategy
Experiment - Using different chunking strategies and evaluating both the chunks returned and also the final output after the inference stage is the best way to perfect your chunking strategy.

Stop using retrieve and generate

The retrieve and generate method that is part of knowledge bases is a key way to get started and produce simple product. However, if you have more complex RAG requirements, there is far more control using just the retrieve method and then combining it with your own inference.

This allows you to do several things. Here are the top ones I have thought of:

Better control what is sent to the vector search - You may want to reformat the search parameters to add extra context or customise. Also you may want to separate formatting instructions. If a user asks for "About a 500 word proposal based on our historic use of DevOps and CI/CD pipelines" the only bit that needs to be sent to a vector search is the bit about DevOps and CI/CD pipelines.
Run multiple inference steps potentially using different models.
Combine multiple knowledge bases - Take chunks from multiple retrieval steps and use them together
Customise the use of guard rails
Use different models that are currently not supported - Currently only a small set of (Anthropic) models are included. If you need to use other models or custom models then you need to do this.

To move beyond retrieve and generate you will need to create your own agent. Some of this can be done using Bedrock agents but for more complex workflows you may be better building your own agent.

Moving beyond retrieve and generate is pretty much a requirement for any of the following sections!

Improving search query

Bedrock knowledge bases has built in query decomposition. This is a great addition to Bedrock, but it is not very well documented in terms of what it does.

If you want to improve your knowledge base access there are a few steps you can take yourself.

Query decomposition
Step-back query
Adding additional context
Adding additional project specific information

Here is an example of a prompt I have used that decomposes a query and includes extra information. It outputs a number of subqueries formatted in JSON. This can then be used create multiple queries for a vector search.

领英推荐

SELECT News From Yugabyte - Feb 23

Yugabyte 2 年前

AWS QuickSight - Full Course 2023 !

Free Online Courses With Printable Certificates 1 年前

How Canva Migrated To DynamoDB To Scale To 220 Million…

Uriel Bitton 2 个月前

DECOMPOSE_PROMPT = {
    'system_message': """
You are an AI assistant that prepares queries that will be sent to a search component.
Your job is to reformulating user queries to improve retrieval in a RAG system.
""",
    'user_instructions': f"""
Perform the following steps:
 - If the query is narrow in focus, generate an additional step-back query that is more general and can help retrieve relevant background information.
 - Rewrite it to be more specific, detailed, and likely to retrieve relevant information.
 - Perform query decomposition. Given a user question, break it down into distinct sub questions that you need to answer in order to answer the original question.


You should produce 1-10 subqueries that, when answered together, would provide a comprehensive response to the original query.
If there are acronyms or words you are not familiar with, do not try to rephrase them.
If the query is already well formed, do not try to decompose it further.

Your output should be a JSON document with an array of subqueries

<examples>
<example1>
<user_input>Did Microsoft or Google make more money last year?</user_input>
<output>
{{
  query: "Did Microsoft or Google make more money last year?",
  subqueries: [
    "How much profit did Microsoft make last year?",
    "How much profit did Google make last year?"
  ]
}}
</output>
</example1>

<example2>
<user_input>What is the capital of France?</user_input>
<output>
{{ 
  query: "What is the capital of France?",
  subqueries: [
    "What is the capital of France?"
  ]
}}
</output>
</example2>

<example3>
<user_input>What are the impacts of climate change on the environment?</user_input>
<output>
{{ 
  query: "What are the impacts of climate change on the environment?",
  subqueries: [
    "What are the impacts of climate change on biodiversity and ecosystems?",
    "How does climate change affect the oceans and sea levels?",
    "What are the effects of climate change on agriculture?",
    "What are the impacts of climate change on human health?",
    "What are the impacts of climate change on weather patterns?"
  ]
}}
</output>
</example3>

</examples>

<user_input>{user_query}</user_input>
"""
}

If you do multiple vector searches you will create a couple of extra problems:

Relevance - While all the chunks have a score, they are being ranked based on different search criteria. While it works, ranking them by their raw score is a na?ve approach.
Deduplication - It is likely that the same chunk may be returned more than once. This can inflate the size of the prompt and possibly lead to other issues with the inference stage.

It is also useful to experiment with temperature. A slightly higher temperature will generally produce a more varied set of search terms.

Routing

Sometimes you will need to combine multiple knowledge bases. This is different to having a single knowledge base with different sources. You may have knowledge bases that are shared between different applications, there may be different data residency requirements, there may be different schema.

Note: A single knowledge base can have multiple data sources. Often it is better to use a single knowledge base if you can as it will have lower overhead and costs.

Whatever the reason, you can send different queries to different knowledge bases. If you have a search modification stage. You can easily add query routing path this phase.

With a lot of query routing it can be wasteful but not detrimental if a query is sent to the wrong knowledge base. Take for instance a music lyrics knowledge base and a film scripts database. If you wanted to get quotes from 'Lady Gaga' you would want to search both. Your search could be very similar for both of them. However if you had one knowledge base containing your corporate US documents and another containing your UK documents, a search about holiday entitlement should normally be directed to only the correct knowledge base (note queries that specifically asked to compare US and UK holiday entitlement would be an exception). Adding routing can be easily combined with a search improvement step.

Using my prompt above you can add a classification step to split queries between multiple knowledge bases and add a knowledge base name to the output JSON

Quality control

Once you have a result returned from the inference step, you can use additional inference step(s) to evaluate the output. This can be useful to ensure references were used, the user request was adequately fulfilled and no hallucinations were generated.

You can add several quality checks to an analysis of your inference output. This can help avoid output that does not meet your business needs. It is then possible to either re-run the inference with different parameters (like temperature or prompt content). It is also possible to run multiple inference steps and select the best results. If none of the results meet your quality requirements then you can advise the user that you were unable to answer teh question.

Here is a possible quality control prompt:

ANSWER_EVALUATION_PROMPT = {
    'system_message': """
You are an analytical AI assistant able to make quality checks
on 'response' and 'user_query'""",
    
    'user_instructions': f"""
Your tasks are to make the following quality checks on 'response'
and 'user_query'. Write your answer to each check as evidence in your output.
Let's do it step by step.

Check 1: Credibility
Does each point in the response have evidence to justify the point?
Evidence should include the sources listed in the source_material.

Check 2: Meets requirements
Does the response fulfill the requirements outlined in 'user_query'?

Check 3: Detail balance
Make an assessment on how generic the user_query is and judge
whether the response is equally as generic or detailed.

Using all the information from the checks,
give a score reflecting how well the response meets all these checks
on an integer scale between 0 and 100.

Please penalise the score if any of the checks show areas of improvement.
Please penalise the score if the response states any missing information.

<response>
{generated_response}
</response>

<user_query>
{user_query}
</user_query>

<source_material>
{source_material}
</source_material>

Analyse each sentence before writing
"""
}

Guardrails

Bedrock guard rails are great but they sometimes need to be applied intelligently. If you have a multi-step process you may not want to apply guard rails at each intermediate step. You may also want to abort future steps if you hit a guard rail.

If you are using a LLM to improve the search query as an initial step this is an important step to apply input guard rails. If they generate a hit, then you can feed back to the user at this point. It is important to detect a guard rail hit wherever they are used so you can take appropriate action.

It is important to include guard rails (especially in anything that is open to the public and not just an internal tool). By breaking up the steps and tailoring the guard rails it is possible to give the user a much better response.

Conclusion

Bedrock Knowledge Bases is a great product for building a RAG solution. There are a few features I wish it had, but is great how quickly you can get started and build a product.

Once you understand the process then it is easy to build your own products on top of Bedrock Knowledge Bases and come up with some really powerful GenAI applications.

I am still building products with Amazon Bedrock and will amend this article based on my experience.

要查看或添加评论，请登录

Andrew Larssen的更多文章

Measuring the cost of Bedrock

2025年3月5日

Measuring the cost of Bedrock

Amazon Bedrock is a great product but it does come with one slight problem - attributing costs. At a very high level…

2 条评论
Claud 3.7 Sonnet - Could this change things?

2025年3月4日

Claud 3.7 Sonnet - Could this change things?

First let's start with the obvious. Anthropic Claude 3.

1 条评论
GraphRAG - What's it all about?

2025年2月28日

GraphRAG - What's it all about?

A while ago all the hype in GenAI was about RAG (Retrieval Augmented Generation). RAG is a technique to give LLM (large…
DeepSeek on Bedrock - the story continues...

2025年2月12日

DeepSeek on Bedrock - the story continues...

Just over a week ago I wrote an article about running DeepSeek on Amazon Bedrock. This is a follow on piece.
RAG for video

2025年2月7日

RAG for video

I have been looking at producing a chatbot able to answer questions based on a company knowledge base. Ideally it would…
DeepSeek on AWS Bedrock

2025年1月29日

DeepSeek on AWS Bedrock

There is a lot of talk right now about DeepSeek. I am a bit scare about running any sort of model where I don't know…
Amazon Bedrock Model Distillation

2024年12月10日

Amazon Bedrock Model Distillation

Model distillation is quite a complex term. Before we look at the Bedrock product it is worth starting out by answering…

1 条评论
ReInvent keynotes update

2024年12月9日

ReInvent keynotes update

There have been 2 keynotes so far. Monday Night Live with Peter DeSantis and the CEO keynote with new CEO Matt Garman.
AWS Resource Control Policies

2024年12月2日

AWS Resource Control Policies

In the last couple of weeks there have been a few announcements coming out of AWS. Normally at this time of year it…
Network security and AWS Transit Gateway

2024年10月15日

Network security and AWS Transit Gateway

There are a few ways you can improve your networking security using AWS Transit Gateway. If you are using AWS multi…

See all articles

Advanced RAG with Amazon Bedrock

Andrew Larssen

AWS evangelist & AWS technical lead Consultant @ PA Consulting Group

Chunking and additional metadata

Stop using retrieve and generate

Improving search query

领英推荐

Routing

Quality control

Guardrails

Conclusion

Andrew Larssen的更多文章

社区洞察

其他会员也浏览了

Transform Your Search Experience with AWS OpenSearch: Ultimate Guide to Master OpenSearch!

RisingWave Newsletter October 2024

RisingWave Newsletter July 2023

Why File Formats Don’t Really Matter Anymore

10 Microsoft Azure Announcements At Ignite Fall 2021

How Amazon Q Can Be Your All-In-One Business Assistant

Unlocking the Power of SnowPro Core Certification: Exam Tips, Preparation & Resources

5 Highlights Thursday | 24th Oct 2024

Week 8: Adding Search Functionality to Your FHIR Server

Metabase Proton driver now open-source, new functions, and new OHLC chart type

Chunking and additional metadata

Stop using retrieve and generate

Improving search query

领英推荐

Routing

Quality control

Guardrails

Conclusion

Andrew Larssen的更多文章

Measuring the cost of Bedrock

Claud 3.7 Sonnet - Could this change things?

GraphRAG - What's it all about?

DeepSeek on Bedrock - the story continues...

RAG for video

DeepSeek on AWS Bedrock

Amazon Bedrock Model Distillation

ReInvent keynotes update

AWS Resource Control Policies

Network security and AWS Transit Gateway

社区洞察

其他会员也浏览了

Transform Your Search Experience with AWS OpenSearch: Ultimate Guide to Master OpenSearch!

RisingWave Newsletter October 2024

RisingWave Newsletter July 2023

Why File Formats Don’t Really Matter Anymore

10 Microsoft Azure Announcements At Ignite Fall 2021

How Amazon Q Can Be Your All-In-One Business Assistant

Unlocking the Power of SnowPro Core Certification: Exam Tips, Preparation & Resources

5 Highlights Thursday | 24th Oct 2024

Week 8: Adding Search Functionality to Your FHIR Server

Metabase Proton driver now open-source, new functions, and new OHLC chart type