Building End To End RAG(Retrieval Augmented Generation) Application using AWS Bedrock

Building End To End RAG(Retrieval Augmented Generation) Application using AWS Bedrock

As Generative AI gained immense popularity following the release of ChatGPT in nov-2022 by Open AI, the exploration for more productivity improvement and Business AI solutions has surged. It has progressed from exploring LLMs with prompts and proof of concepts to using other approaches like RAG, finetuning etc. Now focus has shifted for organizations on how to productize gen AI application in 2024.Will explore below what are the steps to consider to make a production ready gen AI application.

?Below are current popular methods how you can leverage Gen AI for different use cases. Things are changing rapidly and new methods are evolving every day.

?1.Inferencing LLM + prompt engineering

2. RAG

3.Fine tunning

4.Agent based systems

It all started with Inferencing LLMs with sophisticated prompts to get the best out of LLMs trained information. It is successful to a certain extent, but it has issues like accuracy of the output and hallucination. To overcome the hallucination problem, RAG has become popular method to give context to model from the supplied document and this is kind of in context learning along with the LLM's large trained knowledge base.

We will explore in this article how to Build a end to end RAG application using AWS bedrock and lets keep the option 3 &4 for future blogs.

Architecture Diagram:

Let me explain the below architecture diagram and what are the services used for this implementation.

?

RAG Architecture in AWS

Data layer:

Data layer acts as landing layer or storage layer for the ingested unstructured data. Here we are using AWS S3 to store the ingested file. The other component of data layer is Vector DB which stores the converted embeddings of the source data. Here open search is used as vector DB but there are other options available in AWS like document DB(with vector engine),AWS Kendra. Also other open source vector DB like pinecone, milvus etc can be also considered.

AI Layer:

AI Layer is the Center of all the action where we access the Large Language Models (LLMs) and AWS bedrock service gives the option to access different enterprise models. Currently foundation models from different startups like AI21 Labs, Anthropic (Claude), cohere,Llama3,Mistral and Titan models for AWS are available for usage. All these models are deployed in AWS datacenter and available via API access.

For RAG we need 2 type of models

1.Emebeddings model: which will create the embeddings from input text

2.text Model: which will generate the final response.

Consumption Layer:

Consumption layer will have the front end for the application and deployed solution end point for consumption. If you are building a prototype, you may use streamlit or Gradio as frontend for quick development. There are other services available in AWS to build the chatbot ?like AWS lex etc which provide both text and voice interaction option.

????????????????? If you want to build a custom chat application, you may choose any java based framework like React, Angular etc.

There are different deployment options available to deploy this application, You may use Elastic Beanstalk to deploy the application in Ec2.Langchain is used as a framework to interact with LLMs and also build the chain/orchestrate the application step by step.

?

Steps for Implementation:

Below are the steps for implementation:

1.Model Evaluation

2.Code build

3.Testing

4.Guardrails

5.Deployment

6.LLMOps

Model Evaluation:

First step to evaluate what is the best LLM for your use case. You can try in playground(in Bedrock) with some prompts to check the model output. But Bedrock comes with standard feature of model evaluation where you can evaluate your model with relevant data.

?? It provides results based on the parameters that you specify when you create the evaluation, such as accuracy, toxicity, and robustness. Choose from built-in task types, text summarization, question and answer, text classification, and open-ended text generation, and scores will be calculated automatically. Model scores are calculated using various statistical methods such as BERT Score, F1, and more.?

Currently GPT models are not available in AWS bedrock. As per our experience Anthropic 3 or 3.5 sonnet works best. You may use other smaller models also like Mistral/Llama 7B ?as per your requirement also.

For embedding we have used Aws Titan Embedding G1 -Text

Code Build:

Once you selected the LLMs then we have the build code with Langchain, which creates the chain or acts as orchestrator for accessing LLMs and performing different steps sequentially.

The models are accessed thru API calls from bedrock and context, question is passed along with required prompt engineering.

You may use Sage maker studio as IDE for the code build or ?it can also be done from Vscode installed in your local system.

Testing:

You have to check the overall accuracy of the responses received from LLMs which is critical for success of the application. There are two things we take into account while testing RAG based applications.

1.Context relevance

2.Answer relevance

Context relevance is related to the context picked from vector DB after semantic search which is supplied to model for final output. We have to test it if the algorithm is able to pick the right context. Sometimes it require tunning in the search hyperparameter like K value & others to get the required context.

Answer relevance is to check for the models response vs the actual answer expected.

Guardrails:

Guardrails come as a feature in bedrock which are used to implement application-specific safeguards based on your use cases and responsible AI policies.

Create a guardrail by configuring as many filters as you need and test the guardrail with different input to access its performance. Refine the guardrails till it matches with your responsible AI policy.

Deployment:

Once the application testing is completed and results are as per expectation, then the next step is to deploy the whole application.

????????????????????????????????????????? There are different ways to deploy the application using different AWS services. This can be deployed as? docker image in Ec2 directly or using elastic beanstalk. Like any other application this can also be deployed in Elastic container service or elastic Kubernetes service also.

?????????????????? We have used elastic beanstalk to deploy the application Ec2 instance. Make sure you select the right Ec2 instance as per number of users and load on the application.

Once deployed you can setup CI/CD pipeline to automate the code deployment using AWS code pipeline from code repository like GitHub or GitLab etc.

LLMOps:

Large Language Model Ops (LLMOps) encompasses the practices, techniques and tools used for the operational management of large language models in production environments.

???? There are different open source Tools and libraries available for LLMOPs? like Langsmith,MLflow etc for operational management LLMs.

?It is a evolving area as more and more use cases are getting productionized.

Overall AWS bedrock along with other services provides quick and convenient way to implement and productionize Gen AI ?RAG use cases without worrying about? GPUs,deployment,data security aspects ?which often raised with usage of LLMs.RAG? has evolved as one of the convenient and popular ?methods to leverage the LLMs along with user data to get the required results .

????????????????????????? Let me know your feedback on the article. Will cover some of the topics in detail in future article.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了