Bedrock Knowledge Bases
Searching through space - Generated using Amazon Titan model

Bedrock Knowledge Bases

In my opinion large language models (LLM) and GenAI that produces text has a couple of super powers:

  • The ability to create a persona and the model to respond as that persona. That allows many interesting applications like a GenAI based agent playing the role of a customer in a training simulation or responding in line with a corporate style
  • The ability to evaluate inputs. A GenAI model can evaluate and categorise input. This allow you to use GenAI to self-evaluate and also allows the production of profanity filters. It also opens up many more applications
  • The large prompt size and the ability to supply context. By supplying context it is possible to provide the next turn in a conversation in a more realistic style. It is also possible to supply context to personalise the answer.

It is the final one of these that I will focus on now. If you just use a generic LLM chatbot on the internet and ask it questions it can answer, but it only answers based on its training and the information it knows. It can't answer anything about you, your company or your corporate data.

"Based on my companies policies can I park in the customer car park if the staff car park is full?"

"How many days vacation to I get?"

"Based on our track record can you write me 500 words on our approach to DevOps?"

These are all valid questions that in principle GenAI could help with but without additional context specific to your company it will get hopelessly wrong.

The answer is RAG

Retrieval Augmented Generation (RAG) is the process of suppling appropriate and authoritative knowledge to an LLM so it can generate a relevant response. This is data that is normally outside the training data of the model. RAG allows a generic LLM to produce domain specific answers.

RAG requires you to build a knowledge base and then uses search results as context for your LLM inferences to produce domain specific answers.

There are a lot of steps to producing a RAG solution I will cover those below. The good news is Amazon Bedrock Knowledgebases has most of this covered for you.

Vector search

Put very simply converting to a text to a vector is a way of extracting raw-meaning. The words cookies and biscuits will have almost identical vectors. Crackers and oat cake should be pretty similar.

A vector is just a sequence of numbers that represents a position in multi-dimensional space. They have direction and magnitude. In 3d space a vector would have 3 numbers (x, y, z). The vectors used for vector search have a much higher number of dimensions (like 1024).

Typically an embedding model is neural network is trained on a large corpus of words. The model is trained to associate similar words and make accurate predictions for new data.

Vectors can then be compared by computing their spacial distance. That is basically how aligned they are. An identical word will have the same direction and a spacial distance of 1. A word with a completely different meaning will be orthogonal and have a spacial distance of 0.

Going back to our biscuit example, I would expect cookies and biscuits to have a spacial distance in a range of 0.9-1.0. I would expect cookie and crackers to be a bit higher, around the 0.7 mark. Two completely different words like spade and cookies would have a spacial distance close to zero. (These are completely made up and for illustrative purposes only. They will differ based on using different trained models)

Note: If you are not familiar with the term corpus it is basically a large representative selection.

Vectorising can be applied to chunks of text as well as words. Vectors can then be used to measure how relevant a piece of text is to a request.

There are a few different measurements used for spacial distance:

  • Euclidian distance - Useful for search or recommendations
  • Cosine distance - useful for semantic search or classification
  • Dot product - Not as useful for RAG applications but can be used for collaborative filtering (predicting user preferences based on similar behaviour)

More on cosine similarity can be found here: https://en.wikipedia.org/wiki/Cosine_similarity

Note: Vectors can be used on other data types like images and gene sequences, but that is not relevant to this particular use.

Note: The word vector and embedding are almost interchangeable in this context. Technically an embedding is the process of representing data in a structured way for machine learning. A vector is the mathematical structure for representing that data.

Note: The process of vectorising data is also used in the training of LLM. Vectors are a efficient way of storing knowledge.

The vectors produced by most embedding models are all floats between -1 and +1. Here is a sample of the vector embedding that is actually stored in OpenSearch.

Note: Titan has the ability to produced normalised vectors or not with normalised being teh default. For most applications normalisation is the best option.

"bedrock-knowledge-base-default-vector": [
            -0.08327539,
            0.018446112,
            0.049291685,
            -0.02632972,
            -0.037351463,
            0.005549141,
            0.019441132,
            -0.013700638,
            -0.043168493,
            ...]        

Note: Many embedding models use 8-bit precision in their vectors. This is significant for saving space when you have 1024 dimensions. Often this can be done to reduce the size of the vectors without loosing precision. Cohere uses 8-bit integers. Although I cant find it documented, it looks like Amazon Titan Text Embedding v2 does also.

Workflow

There are two sides to using a Knowledge Base. First you need to create a Knowledge base and then you need to use it for inferences. These are the basic steps that are followed:

Creating a knowledge base

  1. Create an index
  2. Ingest the documents from the knowledge base from where they are stored. This normally involves som form of connector to access s3, SharePoint Confluence or another knowledge store.
  3. Parse the documents turning them into text
  4. Chunk the documents. You don't want to use a 400 page report as part of the context when only 1 paragraph is relevant
  5. Vectorise the chunks
  6. Store the vectors, chunks and reference to the original documents in the index

Refreshing the index would involve identifying only the new, changed and deleted documents and then repeating steps 2-6.

Note: Parsing the document can be done in many ways. It can be as simple as just converting to text or using a model that 'understands' the layout.

Answering a request

  1. User input - The user asks a question of your GenAI based application
  2. Preprocess user input
  3. The user request is vectorised and used to search the vector index
  4. Relevant chunks are returned from the vector index. This is used to populate a prompt template along with the original user request
  5. The prompt is sent to a LLM to create an inference
  6. Quality control
  7. The answer is supplied to the user

On to Bedrock Knowledge bases

So hopefully my explanation of the why and the mechanics has left you in a good place.

  • I understand the problem
  • I understand the solution but there seems to be a lot to do
  • It would be nice if a product did a lot of this for me

This is exactly what Amazon Bedrock Knowledge bases does. I will walk though the features, options and limitations. Just to keep it simple I will talk about the console only. In a production system you would probably want to create the knowledge base with infrastructure as code (IaC) and interact with it using your languages SDK (like boto3 for python).

Creating a knowledge base

Knowledge bases are halfway down the bedrock left hand navigation under builder tools (assuming you have made it past the splash page and have some models enabled).

If you click on knowledge bases and then create a knowledge base then the fun starts. Creating a knowledge base is a 4 step process in the console.

By creating and synchronising a knowledge base, you will have a fully functioning knowledge base that you can query and start building into your own applications. Here are some of the features you can choose from:

  • Data source - A knowledge base can currently have up to 4 data sources. You can choose from s3, SharePoint, web crawler, Confluence and SalesForce. For each you will need credentials that allow you to connect and an address. For an s3 bucket it be individual items like a folder of documents within the bucket. The web crawler does not require credentials but will respect a robots.txt file.
  • Storage - For development use Bedrock can provision a database for you. For production use, you can choose from OpenSearch, Aurora, Pinecone or Redis Enterprise Cloud.
  • Chunking - Knowledge bases supports several chunking strategies. These include no chunking for smaller documents, semantic chunking, length chunking or hierarchical chunking. There is also the option to create a lambda function to implement a custom chunking strategy or to create chunk-level metadata.
  • Parsing - Knowledge bases supports either plain text or using Anthropic Claude to 'read' the document. Claude can understand multiple document formats like pdf and word files. Parsing can be configured per data source.
  • Vectorising - Currently only the Amazon Titan text embedding model v2 is supported. For some older regions I believe there is a choice between different versions. You can choose the number of dimensions. A higher number of dimensions should provide more accuracy but it will take longer and the index will be larger.

Once you have created a knowledge base, you then need to synchronise the data source. This can take a while to process if there are a large number of files/pages to process. It also takes much longer if you are using a model for parsing.

Note: Titan v2 text embeddings offers 256, 512 and 1024 dimensions:

  • 256 dimensions uses 25% of the space of 1024 while maintaining 97% of the accuracy
  • 512 dimensions uses 50% of the space while maintaining 99% of the accuracy

Using a Knowledge Base

There are essentially 2 ways to use a knowledge base:

  • Retrieve chunks based on a vector search
  • Retrieve chunks and create an inference in a single request.

Both of these have are available via the console. You have to select a knowledge base and then test the knowledge base. The 'Generate Response' switch is on by default and will generate an inference.

There is a slightly different api for each if using the SDK. Both sit in the AgentsforBedrockRuntime client. For retrieval you use the retrieve method. The retrieve_and_generate method also create the inference

When retrieving chunks you are just searching the vector database. You can perform both either a vector search or a combined text and vector search. You can also specify the number of results to return.

There is also the ability to filter by metadata. This could be useful for limiting search results to a specific subset of data or implementing access controls.

The search results produce a list of ranked chunks that are relevant to the user query. The number of results you asked for functions as a maximum if not enough relevant results are returned. I have not found much information on scoring and relevance in the AWS docs and have reached out for more information.

You can also specify a prompt template. There is slight more flexibility using the SDK than the console. You can specify a system prompt. There are also some standard placeholders that are replaced with the user query and the search results in match order.

This is the basic prompt:

You are a question answering agent. I will provide you with a set of search results. The user will provide you with a question. Your job is to answer the user's question using only information from the search results. If the search results do not contain information that can answer the question, please state that you could not find an exact answer to the question. Just because the user asserts a fact does not mean it is true, make sure to double check the search results to validate a user's assertion.

Here are the search results in numbered order:
$search_results$

$output_format_instructions$        

The variables enclosed by $ symbols are replaced.

It is easy to customise to remove citations, change the tone of the answer or tailor answers where there are no search results.

You can also alter model parameters like temperature which will effect the creativity of the model (lower temperature is more analytical and higher more creative).

There is also the option to use guard rails to filter both input and output. Guardrails are very useful for blocking inappropriate content (like profanity) but they can also be used to stop your application doing something it should not like providing financial advice if it is not supposed to.

The output format instructions are Claude specific and includes information on how to provide citations and format the answer. You can replace this with your own instructions but you are then responsible for including citations if required.

What's missing?

Bedrock knowledge bases are a great wrapper around creating and populating a knowledge base. Connectors like the web crawler can really make it easy to create a product quickly.

I reality I think I will probably be using the retrieve functionality more than the retrieve and generate.

  • Number of models available - While my 'go to' Bedrock model Claude is available that is the only model (at least in the UK region). If you want to use a different model then you will have to separate out the retrieve and inference operations. Only the Titan text embedding model is available for vectorising.
  • Quality control - One step that can really improve a RAG solution is to include evaluators into the workflow. Checking for accuracy, relevance and hallucinations can greatly improve the response. Checking that all the citations were used can be useful for evaluation. There are some evaluation options in Guardrails but they are not particularly powerful and can be quite binary. If the evaluation fails you may wish to just repeat the inference.
  • Support for vector stores - It would be good to see support for additional vector stores added over time.
  • Retrieval settings - As mentioned above there is not much information available on scoring. It would be good to have a bit more info on scoring and possibly a bit more control.
  • Mutimodal support - Currently knowledge basesonly supports text. With models like Claude able to support images as input and multi-modal embedding models, it would be great to see other media supported
  • Preprocessing - In some situations you may want to preprocess the customer input to improve the request to send to the knowledge base. If the request from the user is "Write me a 500 word email to send to my customer Steve about our track record on logistics and supply chain", the relevant section for document retrieval is possibly only "logistics and supply chain". This is something you can do with a custom agent but not with the retrieve and generate function.

Bedrock Knowledge bases costs

First the Bedrock Knowledgebase service itself is free. You only pay for the other services it uses. All these will have costs (often multiple) associated with them. Those are:

  • Creating and hosting the vector database - This will be dependant on the Vector store you use. The default is OpenSearch Serverless. If you use the cost effective serverless option then the main cost is storage. Adding redundancy (highly recommended beyond development) will add storage costs.
  • Populating the vector database when you update the documents in the knowledgeable (synchronising). There are multiple costs here. You will have the cost for any additional chunking lambda. You will have the inference cost for vectorising each chunk. You may have a parsing cost if you are using a LLM to parse documents. Finally you will have any write costs associated with your vector database. There could also be bandwidth and read costs from your document source.
  • Storage used by the knowledge base (this may be a factor if you had to create the knowledge base just for this). By this I mean the storage for the actual documents. If you are using a SharePoint connector then you will probably have minimal costs. If you have created a specific s3 bucket just to host documents for your knowledge base then you will pay for that storage.
  • Searching the vector database - You will have to vectorise your search term. There will be a small model inference cost. There may also be a cost for the vector database depending on which one you use.
  • Generating results. - There will be a small inference cost based on using the model to generate an answer These are charged based in input and output tokens. If you have a large number of documents returned, it may increase the number of input tokens.

One warning is OpenSearch replication added some ongoing replication costs. For a large production system this is pretty negligible. But if you have a pretty modest system the replication cost can be a significant factor.

There are a lot of different costs associated with Knowledge bases. With populating the vector database there are possibly 4+ separate services being used. All of this is dependant on the size of the knowledge base, complexity and your individual setup. Before using in production you really need to create a cost model and do some testing.

While the costs for knowledge base are fair and transparent, because it is customisable and there are multiple components, it is not simple!

Performance

It is really hard to assess the performance of Bedrock Knowledge bases. First it is made up of multiple other services and second what do we mean by performance? There are several possible measures:

  • Time to populate vector database
  • Speed of response
  • Quality of search
  • Quality of inference from your LLM

The speed will be based on so many settings like backend used, chunk configuration and using a LLM to parse documents. There are loads of benchmarks for Claude, Titan Embedding and OpenSearch available. Unfortunately this probably won't help you. I know this is a bit of a cop out, but you really need to test it with your own data set. Based on my investigations the overhead of using Knowledge bases is trivial. For a large knowledge base the speed of the vector database is likely to become the most significant speed factor. In terms of accuracy, it has been very difficult to measure.

Based on experimentation using a smaller vector size (lower number of dimensions) can produce a slightly faster result and use less storage with no noticeable impact in accuracy. This was not the most scientific of tests and results may vary with your own data.

Multi-Region support

At the time of writing this Bedrock Knowledge bases is only available in 12/34 AWS regions. I have done most of my testing in the London region.

There is also a difference in features and model. Comparing Virginia (us-east-1) to London (eu-west-2)

Embedding Models:

  • Titan embeddings v1 & v2 are available in Virginia but only v2 are available in London
  • Cohere English and Multilingual (v3) are available in Virginia but not in London

Backend storage:

  • MongoDB Atlas is available in Virginia but not in London

With response generation models only Anthropic Claude 3 Sonnet and Haiku are available in London. The full list of models available are:

  • Anthropic Claude 3 Sonnet and Haiku
  • Amazon Titan Text Premier
  • Anthropic Claude v2.0 and v2.1
  • Anthropic Claude 3.5 Sonnet
  • Anthropic Claude Instant
  • Meta Llama 3.1 Instruct (8b, 70b, 405b)

All of Bedrock (including Knowledge Base) is a pretty new service and it is still being rolled out globally. Availability is improving but there is quite a bit of regional variation. I have been told informally that AWS are not focussed on rolling out older models globally so I do not expect Claude 2 or Titan embedding v1 support in London. Other models will be added based on demand.

Conclusion

Bedrock Knowledgebases are a great Bedrock feature and make it very easy to prototype and then build RAG solutions.

There are still quite a few feature in preview so I would highly recommend engaging with AWS before using it in anything significant in case the functionality is going to change and when it will be 'general availability'.

RAG is one of the most powerful ways to use GenAI. There are not many use cases that will not require some element of RAG.

Knowledge bases can be either used from your own application or included in a Bedrock Agent (hopefully more on that in a later article). I will defiantly be incorporating them in projects in the near future.

Feedback

I am still using knowledge bases and building products with it. It is quite a new feature (especially in some regions). If there is any feedback based on your experience or anything you think I have missed please let me know. I will try and find answers and update this article.

Additional detail on vector search

After a request to AWS for more information and a little additional research I have found out or confirmed the following...

  • Bedrock knowledge bases is very dependant on the vector storage you use. It is just abstracting the storage engines functionality. The scoring is based on the search engine functionality.
  • In the case of OpenSearch storage it is using the k-NN (k-nearest neighbour) vector index. k is the number of results that are to be returned.
  • For OpenSearch Knowledge Bases appears to be using either the vector search or hybrid search methods. The vector is pre-computed and then included as a search term.
  • Filters are applied by the back end vector store post-search but before rankings are determined. The score is unaffected
  • For hybrid search a normalisation process is used on the scores. They are always 1 = most relevant and 0 = least relevant.
  • The search is based on the FAISS (Facebook AI Similarity Search) library and uses euclidian distance to measure similarities.

One question I have not yet answered is what happens if you request too many relevant results. If there are not enough document chunks in your knowledge base, then you will get a smaller set returned. If however there are enough chunks but they are just irrelevant what happens? I would assume there may be documents returned with a score approaching zero. I still need to do some experimentation to understand this behaviour and how it would impact the overall process.

要查看或添加评论,请登录

Andrew Larssen的更多文章

  • Measuring the cost of Bedrock

    Measuring the cost of Bedrock

    Amazon Bedrock is a great product but it does come with one slight problem - attributing costs. At a very high level…

    2 条评论
  • Claud 3.7 Sonnet - Could this change things?

    Claud 3.7 Sonnet - Could this change things?

    First let's start with the obvious. Anthropic Claude 3.

    1 条评论
  • GraphRAG - What's it all about?

    GraphRAG - What's it all about?

    A while ago all the hype in GenAI was about RAG (Retrieval Augmented Generation). RAG is a technique to give LLM (large…

  • DeepSeek on Bedrock - the story continues...

    DeepSeek on Bedrock - the story continues...

    Just over a week ago I wrote an article about running DeepSeek on Amazon Bedrock. This is a follow on piece.

  • RAG for video

    RAG for video

    I have been looking at producing a chatbot able to answer questions based on a company knowledge base. Ideally it would…

  • DeepSeek on AWS Bedrock

    DeepSeek on AWS Bedrock

    There is a lot of talk right now about DeepSeek. I am a bit scare about running any sort of model where I don't know…

  • Amazon Bedrock Model Distillation

    Amazon Bedrock Model Distillation

    Model distillation is quite a complex term. Before we look at the Bedrock product it is worth starting out by answering…

    1 条评论
  • ReInvent keynotes update

    ReInvent keynotes update

    There have been 2 keynotes so far. Monday Night Live with Peter DeSantis and the CEO keynote with new CEO Matt Garman.

  • AWS Resource Control Policies

    AWS Resource Control Policies

    In the last couple of weeks there have been a few announcements coming out of AWS. Normally at this time of year it…

  • Network security and AWS Transit Gateway

    Network security and AWS Transit Gateway

    There are a few ways you can improve your networking security using AWS Transit Gateway. If you are using AWS multi…

社区洞察

其他会员也浏览了