Day 7: Building complete RAG pipeline in AWS

Day 7: Building complete RAG pipeline in AWS

This is part of the series?—?10 days of Retrieval Augmented Generation

Before we start our seventh day, let us have a look at what we have discussed and what lies ahead in this 10 days series:

  1. Day 1: Introduction to Retrieval Augmented Generation
  2. Day 2: Understanding core components of RAG pipeline
  3. Day 3: Building our First RAG
  4. Day 4: Building Multi Documents RAG and packaging using Streamlit
  5. Day 5: Creating RAG assitant with Memory
  6. Day 6: Building complete RAG pipeline in Azure
  7. Day 7: Building complete RAG pipeline in AWS (*)
  8. Day 8: Evaluation and benchmarking RAG systems
  9. Day 9: End to End Project 1 on RAG (Real World) with React JS frontend
  10. Day 10: End to End Project 2 on RAG (Real World) with React JS frontend


Frankly speaking, building a RAG in AWS was a little typical as compared to building it in Azure. Mostly, due to lack of good resources provided by AWS. Anyways, after trying multiple approches, finally our Day 7 of building RAG on AWS is complete. Let's first summarize what we are going to be talking about in this article -

  1. Creation of Index in Amazon Kendra
  2. Activating Amazon Bedrock
  3. Configuring local system to access above resources
  4. Building the RAG

In this article also, we will be using the same four Financial Documents that we used in previous articles. So, let us start our discussion.

Creation of Index in Amazon Kendra

First, you need to have an account in AWS. For this you may require Credit Card details as well. They will not charge you for free tier services, but still cc is required. Once we have the account, we will search for Amazon Kendra inside the Amazon Management Console.

Inside Kendra, we will click on create an index. We will give the index a name and then keep all the settings as default.

Once the Index is created, next step is to connect it with a datasource. But before connecting it to a data source, the first step involved is to create a datasource. So, we will create an S3 bucket and inside it we will place all our PDF files.

To create this bucket we will search for S3 in the search bar and then click on create new bucket. We will give it a name and then keep all the settings default, as given in image below.

Once the bucket is created, we will go inside it and add our PDF files by clicking upload button, as given below.

Now that our S3 bucket is created and our source files are kept inside it, we will connect it with our Kendra index. For that we will again come back to our index and inside it we will find add a datasource option. We will click on it.

We will see a lot of data source options, and from them we will select S3 connector, as given below

We will give the connector a name and then we will select the bucket we just created. Also, we need to pass an IAM role. We will select the same role which we created for Kendra. Finally, we will click on Create.


It will take sometime to finish the processing, as it is a two step process. First crawling of the PDF files is done, followed by indexing them. Once both the process is complete, our index becomes active and ready to be used.

Next step will be to make Amazon Bedrock active.

Activating Amazon Bedrock

Search for Bedrock in the search bar and then open it. On the left hand side you will find Model Access. Click on it. It will list all the models that Bedrock provides, as given below,

Find the model which is available for you. Click on Manage Model Access on the top right side and then select that model. Finally click on request access. It will take few minutes and after that the model will be active for you. Remember, using the model will be chargeable.

This finishes everything that's needed to do from the server side. Now, let's work on our local system.

Configuring local system to access resources

We need to first install AWS CLI and Boto3 in our system

pip install awscli boto3        

Next, we need to configure our credentials. For this run the following command,

aws configure --profile default        

This will ask for your aws_access_key, aws_secret_key and aws_region. Input the details by referring to the settings of your AWS account and then your system will be ready to proceed with building our RAG.

Building the RAG

Let's see step by step how to build our RAG in AWS.

  • Step 1: Similar to FAISS or Azure AI Search retrievers we saw in previous articles, we need to create Amazon Kendra retriever.

kendra_client = boto3.client("kendra", region_name=kendra_region)
retriever = AmazonKendraRetriever(index_id=kendra_index, top_k=3, client=kendra_client)         

  • Step 2: We need to create our Bedrock Runtime. In this runtime we will pass the model that we have activated in Amazon Bedrock sometime back.

bedrock_client = boto3.client("bedrock-runtime", region_name=kendra_region)
llm = Bedrock(model_id="ai21.j2-mid-v1", region_name=bedrock_region, 
            client=bedrock_client)        

  • Step 3: Now we will create a Retriever which takes the Kendra Index and the Bedrock LLM as input.

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)        

  • Step 4: Finally, we will ask the required question to the retriever, and get our response.

response = qa(query)
print(response["source_documents"][0].page_content)        

The above code once executed will give the following response

Document Title: Stock-Investing-101-eBook.pdf
Document Excerpt:
The question of building wealth in your life will really boil down to two questions? 1) Are you able to save each year? 2) When you save, where do you put the money? Hypothetical Let’s assume you’re 20 years old and just took a job as a fireman, your childhood dream (what kid doesn’t want to be a fireman, right?). Your salary is meager, but you make the goal to save $1,000 dollars per year and put it in a retirement account. You work and save for the next 50 years until you retire. Does it really matter where I put that money, I mean it’s only a thousand bucks a year? Well you have a couple of options, let’s evaluate. 1). The savings Account (otherwise known as the “Under the Mattress” approach). The easiest and “safest” thing is you could just put the money in cash. Nice and safe! It will never go away and it won’t go up and down.        

The complete code for your reference is given below

from langchain.retrievers import AmazonKendraRetriever
from langchain.llms.bedrock import Bedrock
from langchain.chains import RetrievalQA
import boto3

kendra_index = 'fee2562f-xxxx-xxxx-xxxx-xxxxxxxxxxx'
bedrock_region = 'us-east-1'
kendra_region = 'us-east-1'

def get_kendra_doc_retriever():
    
    kendra_client = boto3.client("kendra", region_name=kendra_region)
    retriever = AmazonKendraRetriever(index_id=kendra_index, top_k=3, client=kendra_client) 
    return retriever

query = "How to save money?Explain in detail please"

retriever = get_kendra_doc_retriever()

print(retriever.get_relevant_documents(query))           
            
bedrock_client = boto3.client("bedrock-runtime", region_name=kendra_region)
llm = Bedrock(model_id="ai21.j2-mid-v1", region_name=bedrock_region, 
            client=bedrock_client)

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)

print("----------------------------------------------------------------------------------------------------")
response = qa(query)
print(response["source_documents"][0].page_content)        

This finishes our discussion on Building the RAG in AWS. In the next article we will talk about how to evaluate the RAGs that we build. In case of any doubts in this article, please post your questions as comments. See ya!!

要查看或添加评论,请登录

Himanshu Singh的更多文章

社区洞察

其他会员也浏览了