Bedrock Knowledge Bases
In my opinion large language models (LLM) and GenAI that produces text has a couple of super powers:
It is the final one of these that I will focus on now. If you just use a generic LLM chatbot on the internet and ask it questions it can answer, but it only answers based on its training and the information it knows. It can't answer anything about you, your company or your corporate data.
"Based on my companies policies can I park in the customer car park if the staff car park is full?"
"How many days vacation to I get?"
"Based on our track record can you write me 500 words on our approach to DevOps?"
These are all valid questions that in principle GenAI could help with but without additional context specific to your company it will get hopelessly wrong.
The answer is RAG
Retrieval Augmented Generation (RAG) is the process of suppling appropriate and authoritative knowledge to an LLM so it can generate a relevant response. This is data that is normally outside the training data of the model. RAG allows a generic LLM to produce domain specific answers.
RAG requires you to build a knowledge base and then uses search results as context for your LLM inferences to produce domain specific answers.
There are a lot of steps to producing a RAG solution I will cover those below. The good news is Amazon Bedrock Knowledgebases has most of this covered for you.
Vector search
Put very simply converting to a text to a vector is a way of extracting raw-meaning. The words cookies and biscuits will have almost identical vectors. Crackers and oat cake should be pretty similar.
A vector is just a sequence of numbers that represents a position in multi-dimensional space. They have direction and magnitude. In 3d space a vector would have 3 numbers (x, y, z). The vectors used for vector search have a much higher number of dimensions (like 1024).
Typically an embedding model is neural network is trained on a large corpus of words. The model is trained to associate similar words and make accurate predictions for new data.
Vectors can then be compared by computing their spacial distance. That is basically how aligned they are. An identical word will have the same direction and a spacial distance of 1. A word with a completely different meaning will be orthogonal and have a spacial distance of 0.
Going back to our biscuit example, I would expect cookies and biscuits to have a spacial distance in a range of 0.9-1.0. I would expect cookie and crackers to be a bit higher, around the 0.7 mark. Two completely different words like spade and cookies would have a spacial distance close to zero. (These are completely made up and for illustrative purposes only. They will differ based on using different trained models)
Note: If you are not familiar with the term corpus it is basically a large representative selection.
Vectorising can be applied to chunks of text as well as words. Vectors can then be used to measure how relevant a piece of text is to a request.
There are a few different measurements used for spacial distance:
More on cosine similarity can be found here: https://en.wikipedia.org/wiki/Cosine_similarity
Note: Vectors can be used on other data types like images and gene sequences, but that is not relevant to this particular use.
Note: The word vector and embedding are almost interchangeable in this context. Technically an embedding is the process of representing data in a structured way for machine learning. A vector is the mathematical structure for representing that data.
Note: The process of vectorising data is also used in the training of LLM. Vectors are a efficient way of storing knowledge.
The vectors produced by most embedding models are all floats between -1 and +1. Here is a sample of the vector embedding that is actually stored in OpenSearch.
Note: Titan has the ability to produced normalised vectors or not with normalised being teh default. For most applications normalisation is the best option.
"bedrock-knowledge-base-default-vector": [
-0.08327539,
0.018446112,
0.049291685,
-0.02632972,
-0.037351463,
0.005549141,
0.019441132,
-0.013700638,
-0.043168493,
...]
Note: Many embedding models use 8-bit precision in their vectors. This is significant for saving space when you have 1024 dimensions. Often this can be done to reduce the size of the vectors without loosing precision. Cohere uses 8-bit integers. Although I cant find it documented, it looks like Amazon Titan Text Embedding v2 does also.
Workflow
There are two sides to using a Knowledge Base. First you need to create a Knowledge base and then you need to use it for inferences. These are the basic steps that are followed:
Creating a knowledge base
Refreshing the index would involve identifying only the new, changed and deleted documents and then repeating steps 2-6.
Note: Parsing the document can be done in many ways. It can be as simple as just converting to text or using a model that 'understands' the layout.
Answering a request
On to Bedrock Knowledge bases
So hopefully my explanation of the why and the mechanics has left you in a good place.
This is exactly what Amazon Bedrock Knowledge bases does. I will walk though the features, options and limitations. Just to keep it simple I will talk about the console only. In a production system you would probably want to create the knowledge base with infrastructure as code (IaC) and interact with it using your languages SDK (like boto3 for python).
Creating a knowledge base
Knowledge bases are halfway down the bedrock left hand navigation under builder tools (assuming you have made it past the splash page and have some models enabled).
If you click on knowledge bases and then create a knowledge base then the fun starts. Creating a knowledge base is a 4 step process in the console.
By creating and synchronising a knowledge base, you will have a fully functioning knowledge base that you can query and start building into your own applications. Here are some of the features you can choose from:
Once you have created a knowledge base, you then need to synchronise the data source. This can take a while to process if there are a large number of files/pages to process. It also takes much longer if you are using a model for parsing.
Note: Titan v2 text embeddings offers 256, 512 and 1024 dimensions:
Using a Knowledge Base
There are essentially 2 ways to use a knowledge base:
领英推荐
Both of these have are available via the console. You have to select a knowledge base and then test the knowledge base. The 'Generate Response' switch is on by default and will generate an inference.
There is a slightly different api for each if using the SDK. Both sit in the AgentsforBedrockRuntime client. For retrieval you use the retrieve method. The retrieve_and_generate method also create the inference
When retrieving chunks you are just searching the vector database. You can perform both either a vector search or a combined text and vector search. You can also specify the number of results to return.
There is also the ability to filter by metadata. This could be useful for limiting search results to a specific subset of data or implementing access controls.
The search results produce a list of ranked chunks that are relevant to the user query. The number of results you asked for functions as a maximum if not enough relevant results are returned. I have not found much information on scoring and relevance in the AWS docs and have reached out for more information.
You can also specify a prompt template. There is slight more flexibility using the SDK than the console. You can specify a system prompt. There are also some standard placeholders that are replaced with the user query and the search results in match order.
This is the basic prompt:
You are a question answering agent. I will provide you with a set of search results. The user will provide you with a question. Your job is to answer the user's question using only information from the search results. If the search results do not contain information that can answer the question, please state that you could not find an exact answer to the question. Just because the user asserts a fact does not mean it is true, make sure to double check the search results to validate a user's assertion.
Here are the search results in numbered order:
$search_results$
$output_format_instructions$
The variables enclosed by $ symbols are replaced.
It is easy to customise to remove citations, change the tone of the answer or tailor answers where there are no search results.
You can also alter model parameters like temperature which will effect the creativity of the model (lower temperature is more analytical and higher more creative).
There is also the option to use guard rails to filter both input and output. Guardrails are very useful for blocking inappropriate content (like profanity) but they can also be used to stop your application doing something it should not like providing financial advice if it is not supposed to.
The output format instructions are Claude specific and includes information on how to provide citations and format the answer. You can replace this with your own instructions but you are then responsible for including citations if required.
What's missing?
Bedrock knowledge bases are a great wrapper around creating and populating a knowledge base. Connectors like the web crawler can really make it easy to create a product quickly.
I reality I think I will probably be using the retrieve functionality more than the retrieve and generate.
Bedrock Knowledge bases costs
First the Bedrock Knowledgebase service itself is free. You only pay for the other services it uses. All these will have costs (often multiple) associated with them. Those are:
One warning is OpenSearch replication added some ongoing replication costs. For a large production system this is pretty negligible. But if you have a pretty modest system the replication cost can be a significant factor.
There are a lot of different costs associated with Knowledge bases. With populating the vector database there are possibly 4+ separate services being used. All of this is dependant on the size of the knowledge base, complexity and your individual setup. Before using in production you really need to create a cost model and do some testing.
While the costs for knowledge base are fair and transparent, because it is customisable and there are multiple components, it is not simple!
Performance
It is really hard to assess the performance of Bedrock Knowledge bases. First it is made up of multiple other services and second what do we mean by performance? There are several possible measures:
The speed will be based on so many settings like backend used, chunk configuration and using a LLM to parse documents. There are loads of benchmarks for Claude, Titan Embedding and OpenSearch available. Unfortunately this probably won't help you. I know this is a bit of a cop out, but you really need to test it with your own data set. Based on my investigations the overhead of using Knowledge bases is trivial. For a large knowledge base the speed of the vector database is likely to become the most significant speed factor. In terms of accuracy, it has been very difficult to measure.
Based on experimentation using a smaller vector size (lower number of dimensions) can produce a slightly faster result and use less storage with no noticeable impact in accuracy. This was not the most scientific of tests and results may vary with your own data.
Multi-Region support
At the time of writing this Bedrock Knowledge bases is only available in 12/34 AWS regions. I have done most of my testing in the London region.
There is also a difference in features and model. Comparing Virginia (us-east-1) to London (eu-west-2)
Embedding Models:
Backend storage:
With response generation models only Anthropic Claude 3 Sonnet and Haiku are available in London. The full list of models available are:
All of Bedrock (including Knowledge Base) is a pretty new service and it is still being rolled out globally. Availability is improving but there is quite a bit of regional variation. I have been told informally that AWS are not focussed on rolling out older models globally so I do not expect Claude 2 or Titan embedding v1 support in London. Other models will be added based on demand.
Conclusion
Bedrock Knowledgebases are a great Bedrock feature and make it very easy to prototype and then build RAG solutions.
There are still quite a few feature in preview so I would highly recommend engaging with AWS before using it in anything significant in case the functionality is going to change and when it will be 'general availability'.
RAG is one of the most powerful ways to use GenAI. There are not many use cases that will not require some element of RAG.
Knowledge bases can be either used from your own application or included in a Bedrock Agent (hopefully more on that in a later article). I will defiantly be incorporating them in projects in the near future.
Feedback
I am still using knowledge bases and building products with it. It is quite a new feature (especially in some regions). If there is any feedback based on your experience or anything you think I have missed please let me know. I will try and find answers and update this article.
Additional detail on vector search
After a request to AWS for more information and a little additional research I have found out or confirmed the following...
One question I have not yet answered is what happens if you request too many relevant results. If there are not enough document chunks in your knowledge base, then you will get a smaller set returned. If however there are enough chunks but they are just irrelevant what happens? I would assume there may be documents returned with a score approaching zero. I still need to do some experimentation to understand this behaviour and how it would impact the overall process.