Creating RagaaS: With AWS Bedrock & LangChain + the Consequential Death of SQL
Nikunj J Parekh
Agentic AI Executive | CTO @ EV Platform | Principal DMTS | Board Advisor | IEEE | Speaker | President, IIT Tech Clubs | Author | Angel Investor
No click-baiting! RagaaS stands for "Retrieval Augmented Generation as a Service" and has nothing to do with my background in Music (although the capital S at the end was hint enough ;).)
The focus of my articles is to underline the point that AI is not for just the bold (or italic, irk..), but is easy.
In this post, you'll learn how you can set up and integrate Amazon Bedrock with your LangChain app for an end-to-end RAG pipeline.
A passing comment, LangChain is a good name, but if they'd call it GPTProjects, or LLMProjects, that'd align with its job as well. Although, it is a chain, so it is one-way flow of generated info (usually text) from LLMs to humans or other consuming systems. It's like IFTTT for LLMs, if you're familiar with that service, except that LangChain is free unlike IFTTT.
Amazon Bedrock is a fully managed AWS service that gives you access to popular foundation models from leading AI companies like Anthropic and Mistral AI via a single API.
Since this is a fully managed service, it has the ability to handle the complete RAG pipeline for your application.
For this tutorial, we're going to work with the Amazon Titan Text G1 - Lite model, specifically: amazon.titan-text-lite-v1. We could work with any of the embedding models, such as OpenAI's "ada" (text-embedding-ada-002) or we can pick from many that HuggingFace offers or using Claude.
You can use "ada" just like I show below, but let's proceed with Titan Text G1 - Lite due to convenience on AWS.
from langchain.embeddings import OpenAIEmbeddings
emb = OpenAIEmbeddings(model_name="ada")
emb.embed_query(text) # text / token / sentence...
You can generate and observe embeddings just like that. The vector has 1536 dimensions (this is considered small; Ada is small).
Why choose Amazon Bedrock?
Whether you're an existing AWS customer or new to the platform, Amazon Bedrock is a solid choice for the following reasons:
- Fine-tuning and RAG: Easily fine-tune your choice of foundation models (FMs) and use Bedrock as a RAG-as-a-service
- Serverless and scalable: Scale to production without infrastructure worry while AWS easily scales based on your setup and usage - serverless!
- Model evaluation via Single API: Switch between FMs without heavily re-writing code. All FMs integrate with the same Bedrock API.
Retrieval-Augmented Generation (RAG)
Retrieval-augmented generation (RAG) is a technique that uses knowledge that wasn't part of a model's initial training data. This helps the model get additional relevant context from specific data sources so its output is enhanced.
Knowledge like private company documents could be used to improve a LLM's response to a specific user prompt or query.
RAG Options with Amazon Bedrock
We have two options to build a Retrieval augmented generation (RAG) pipeline using Amazon Bedrock.
- Integrate with data framework: when you just need to use Bedrock's Foundation Models (FM) for NLP tasks, and handle the RAG pipeline outside of Bedrock using a data framework, like LangChain.
- Knowledge Bases for Amazon Bedrock: Let Bedrock fully handle the RAG pipeline using Knowledge Bases. This could be referred to as: "RAG-as-a-service". Bedrock handles ingestion, embedding, querying, and vector stores and can also provide source attribution from your private documents and data sources.
We're going to implement both options in this tutorial.
Access to Amazon Bedrock
By default, you do not have access to the Amazon Bedrock Foundation Models (FMs). You'll need to:
- Step 1: Add the required permissions to the user
- Step 2: Request access to the foundation model
Step 1: Grant user Bedrock permissions in IAM
Sign into your AWS Console and navigate to the Identity and Access Management (IAM) page. You should then click on the User name that will be used to access the Amazon Bedrock foundation models.
After selecting the user, scroll down to the Permissions tab and choose Add permissions from the dropdown options as shown below:
The last step is to find the AmazonBedrockFullAccess permission from the list by searching for it. To proceed, select it and save the updated settings.
Step 2: Request access to Bedrock's FMs
Now we need to navigate to Amazon Bedrock to request access to the foundation model, in our case the Amazon Titan. Type Bedrock in the AWS console search bar then click on the Amazon Bedrock service, as shown below:
Now, let's go ahead and request access to the Amazon Titan Text G1 - Lite model. You can choose whichever foundation model you like from the available ones and request access, as I've mentioned before. On the top right, click on the Manage model access button.
This will enable selecting which models you want to request access to. For this tutorial, let's go ahead and choose Titan Text G1 - Lite then click on Request access at the bottom of the screen.
You should see the text Access granted in the Access status column next to the Titan Text G1 - Lite once you're done (As shown in the screenshot above).
LangChain + Bedrock
Let's set up our environment and write some code that will let our Python LangChain app interact with the foundation model.
To do this, we're going to:
- Step up the work directory
- Configure Boto3
- Configure AWS Access KeysUsing AWS CLIManual configuration
- Integrate LangChain and Bedrock
Step 1: Set up the work directory
Create a new directory and a new Python file, say: lang.py. The remainder of the articles trusts that you're working in a virtualenv = venv, and python module = lang.py next step is optional but recommended.
Now, we'll install Boto3 and LangChain using pip. Here's my simple requirements.txt and the way to install that in my venv.
boto3
langchain
[ ~/dir-langchain-on-bedrock](venv) $ pip3 install -r requirements.txt
To give the application access to our AWS resources, including Bedrock, we'll need to set up the AWS authentication credentials for the IAM user. In your AWS console, go to the IAM service, choose your User then navigate to the Security credentials tab, scroll down until the Access keys card is visible. On the top right, click on the Create access key button. Choose Local code from the list. The purpose of that webpage is just to inform you about the best practices when using access keys. Make sure to copy your keys and keep them handy for the next step below.
Step 3: Configuring AWS Access keys for Boto3
We have two options to configure the AWS Access keys - Using the AWS CLI or Manual configuration
领英推è
Option 1: Using the AWS CLI
This is the fastest way but requires you to download the AWS CLI. You should opt for this option if you already the CLI installed on your system.
In your terminal window, type aws configure. This instructs the AWS CLI to generate a credentials file. The CLI will ask you for the AWS Access Key ID, AWS Secret Access Key, Default region name, (for example us-west-1), and Default output format (JSON is the default if you left it blank). The AWS CLI will generate your credentials file in the ~/.aws/credentials directory. Boto3 and LangChain will use this file to communicate with Bedrock - or any AWS services.
Option 2: Manual configuration
Manually create the credentials file, as below:
[bedrock]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
Also create the config file in the same directory. The purpose of this file is to store the resource region, as below:
[bedrock]
region=us-west-1
Step 4: Integrating Bedrock with LangChain
Add the following to your lang.py file:
from langchain.chains import LLMChain
from langchain_community.llms import Bedrock
from langchain_core.prompts import PromptTemplate
# This object will let LangChain know the model to communicate with and the AWS credentials to use
llm = Bedrock(
model_id="amazon.titan-text-lite-v1",
credentials_profile_name="bedrock"
)
# Now, create PromptTemplate
prompt_template = "What is the capital city of {country}?"
prompt = PromptTemplate(
input_variables=["country"], template=prompt_template
)
# Finally, create LLMChain to prompt the model
llm = LLMChain(llm=llm, prompt=prompt)
response = llm.invoke({"country": "Canada"})
print(response['text'])
Run this python module at the command prompt. Hopefully your code also ran, and here's the response:
Foundation model response: The capital city of Canada is Ottawa.
Was that eaassy?
How to have a production quality, managed, end-to-end RAG pipeline as a service (RagaaS)? We need to implement Knowledge Bases for Amazon Bedrock.
Knowledge Bases
Knowledge Bases for Amazon Bedrock takes care of the full RAG pipeline. This way Bedrock will handle embeddings, storage, data ingestion. and querying. It acts as a RAG-as-a-service fully managed by AWS.
Currently, Knowledge Bases only works with an S3 bucket as the data source.
It will set up the embedding model (of your choice) that converts the contents of the files in the S3 bucket to vector embeddings which it will store in a vector database (of your choice). The RagaaS data flow is shown below:
Limited support for foundation models
Anthropic models are only supported for querying. If you want to use those, you'd have to again Request model access for it!
Creating a new Knowledge base (KB)
A Knowledge base is simply a data / knowledge source. Click on the Knowledge Base item from the sidebar on the Bedrock console then click on the Create Knowledge Base button. Fill in -
- Knowledge base details: Enter any name for your Knowledge base. Then, Choose Create and use a new service role and click on the Next button.
- Set up data source: The only option is an S3 bucket. This bucket must contain the files that will be converted to embeddings and stored in a vector store for future querying. (Tip: You can save this write up itself as PDF and use it as a test file in your S3 Bucket). Are you also missing AWS Kinesis, SQS, Kafka, ElastiCache, RDS, Redshift, Athena, DynamoDb, DocumentDb, and Aurora ... Amazon Web Services (AWS) - no rush :)
- Select embeddings model and configure vector store: I'm going for the default settings for both. Titan Embeddings G1 - Text v1.2 for embeddings and a new Amazon OpenSearch Serverless vector store.
- Next, click on the Create knowledge base. It takes a few minutes.
- Click on the Sync button to finalize.
Testing the Knowledge Base
Hit the Test Knowledge Base tab on the right. Enter your prompt in the text field and click Run.
It should take a few seconds but as you can see, the model's response is accurate and based on the provided PDF file in the S3 bucket. The knowledge base does the similarity search for us, prompts the model, and returns the response.
The complete RAG pipeline is all setup and handled by Amazon Bedrock.
Integrating with LangChain using the Knowledge Bases Retriever
Now it's time to query our Knowledge base using LangChain. For this example, I am going to use the RetrievalQA chain.
In the same lang.py file, let's import the following packages:
from langchain.chains import RetrievalQA
from langchain_community.retrievers import AmazonKnowledgeBasesRetriever
# Instantiate AmazonKnowledgeBasesRetriever
retriever = AmazonKnowledgeBasesRetriever(
knowledge_base_id="KNOWLEDGE_BASE_ID",
credentials_profile_name="bedrock",
retrieval_config={"vectorSearchConfiguration": {"numberOfResults": 3}},
)
The following fields are required:
- knowledge_base_id: Grab the ID from the Knowledge base page in the AWS console.
- credentials_profile_name: This is the profile that has access to the Amazon Bedrock service, in our case [bedrock].
- retrieval_config: I usually like to return the top 3 similar results from the vector store. Feel free to adjust as you like.
Let's set up our RetrievalQA chain. We'll need to provide it with the llm and retriever objects:
model_kwargs_claude = {"temperature": 0, "top_k": 10, "max_tokens_to_sample": 3000}
llm = Bedrock(
model_id="anthropic.claude-v2:1",
credentials_profile_name="bedrock-kb",
model_kwargs=model_kwargs_claude
)
qa = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever
)
# Finally, query the FM
query = "Why is LangChain is good choice?"
response = qa.invoke(query)
print(response['result'])
Share the response you get in the comments! Hopefully, it is the same as the one you got in the AWS Console.
Claude's response is based on the files in the S3 bucket and nothing else. I only had a PDF version of this post in the S3 bucket for the model to reference through the vector database we chose. With the Cloud, horizontal scalability is a given, which means that the pipeline is serverless, stateless, and will be able to continuously deliver as a highly performant RagaaS product across the numerous documents you add.
The term RAG-as-a-service becomes evident now: the suer does not have to think about any part of the RAG pipeline. We set up the KB and let Amazon Bedrock do the rest.
Summary
If you're looking to manage your own RAG pipeline, you can get up and running in no time by just setting up the required permissions and using the LangChain Bedrock class to connect with one of the foundation models.
Otherwise, it is EASY for anyone to use the Knowledge Bases feature for Amazon Bedrock which will handle creating all the components in your RAG pipeline for you. As we've seen, this includes storage, embedding, querying, data ingestion, and everything in between.
By the way, SQL is dead, you know right? See below clip to confirm it for yourself. The rises of RagaaS will bring data and compute power closer to humans than ever before, and with AI, finally, there will be more for the computers to work and less for humans, to get to the explainability of any documentable topic.
Thanks!
Hardware Engineer 2 at Microsoft | Machine Learning, GenAI | Ph.D. Duke University
1 å¹´Nice! Helpful implementation. A YouTube channel might be on the way too!
CEO @ CyberEdx | Best-Selling Author | Radio Host
1 å¹´Nice
Co-Founder @ Ascendus AI and DocAvatar AI | Generative AI, Large Language Models
1 å¹´The OpenSearch service is expensive and one can run up quite a bill. On the other hand, the Bedrock models are quite inexpensive. So I've found it to be more cost effective to set up and run your own embeddings index via FAISS - https://python.langchain.com/docs/integrations/vectorstores/faiss AWS will need to bring down pricing significantly on this service before RaGaaS can become a pleasant melody :)