Day 6: Building complete RAG pipeline in Azure
This is part of the series?—?10 days of Retrieval Augmented Generation
Before we start our sixth day, let us have a look at what we have discussed and what lies ahead in this 10 days series:
In previous articles we saw how we can use OpenAI with LangChain framework, alongwith vector db like FAISS. But, in this article, we will see how can we use Azure Services for building a similar RAG solution. The services that we will use for this are:
Let's start this article by understanding how we can index our documents using Azure AI Search. We will use the same Finance documents that we used in the previous articles.
Indexing using Azure AI Search
The first step will be to create an account in Azure and then open Azure AI Search service.
Click on the create button to create a new service.
Fill in the fields, create the service and then note down the service name. Next, let's create a vector store index.
from langchain.embeddings.openai import OpenAIEmbeddings
from openai import AzureOpenAI
index_name = "langchain-vector-demo"
azure_search_endpoint="your-ai-search-endpoint"
azure_search_key="your-ai-search-key"
api_key = "your-open-ai-key"
azure_oai_api_key = "your-azure-open-ai-key"
azure_oai_endpoint="your-azure-open-ai-endpoint",
azure_oai_deployment_name="your-azure-open-ai-deployment-name"
embeddings = OpenAIEmbeddings(api_key=api_key)
client = AzureOpenAI(
api_key=azure_oai_api_key,
api_version="2023-05-15",
azure_endpoint=azure_oai_endpoint,
azure_deployment=azure_oai_deployment_name,
)
The above code helps you define all your configurations and then instatiate the llm. For Azure AI Search, the endpoint can be retrieved if we go to the resource we just created.
To get the keys, we need to go to the keys section,
领英推荐
For Azure Open AI, we need to create a service there as well. By default, it will not be enabled and one can request it through an application form. If you don't have Azure Open AI, you can proceed with your normal Open AI keys as well.
Next, lets create the Vector Store.
from langchain.vectorstores.azuresearch import AzureSearch
vector_store = AzureSearch(
azure_search_endpoint=azure_search_endpoint,
azure_search_key=azure_search_key,
index_name=index_name,
embedding_function=embeddings.embed_query
)
The above code will help us create the vector store (or index) inside the AI Search. The first two parameters, azure_search_endpoint & azure_search_key, is used for logging into the Azure AI Search service. Next, we pass the index_name, which can be any name we want to give to our Vector Store. In our usecase, we kept the name as - langchain-vector-demo. Through embedding_function we pass the Open AI embedding function, which will help create the embeddings of the text that we will store.
Now that we have the vector-store, next let's add the documents inside it. We will use the same functions for splitting the pdfs, which we used in previous articles.
def load_and_process_pdfs(pdf_folder_path):
documents = []
for file in os.listdir(pdf_folder_path):
if file.endswith('.pdf'):
pdf_path = os.path.join(pdf_folder_path, file)
loader = PyPDFLoader(pdf_path)
documents.extend(loader.load())
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(documents)
return splits
pdf_folder_path = "./fin_ed_docs"
splits = load_and_process_pdfs(pdf_folder_path)
Now, we will add these splits to the Vector Store
vector_store.add_documents(documents=splits)
It will take sometime, and then your Vector Store is ready for the queries. Let's test it.
docs = vector_store.similarity_search(
query="How to save money?",
k=3,
search_type="hybrid",
)
print(docs[0].page_content)
In the above code we have used hybrid search approach. It combines keywords search with cosine similarity approach to give us better results. The response that we get is -
7SECTION 1
Who are Investors?
The truth is that WE ARE ALL investors. When we hear the word investors,
we may think of a high-flying Wall-Street banker in a blue-pin striped
suit. That is certainly one type of investor, but so is the business owner, the
family trying to save for their kids’ college, and the college student trying to
scrape up enough quarters to eat dinner. We all need to manage the money we
make, and we all hope to end up with as much money as possible.
The question of building wealth in your life will really boil down to two
questions?
1) Are you able to save each year?
2) When you save, where do you put the money?
Hypothetical
Let’s assume you’re 20 years old and just took a job as a fireman,
your childhood dream (what kid doesn’t want to be a fireman,
right?). Y our salary is meager, but you make the goal to save $1,000
dollars per year and put it in a retirement account. Y ou work and
save for the next 50 years until you retire.
That's it. Our RAG is complete. This is the difference between using Cloud Services and Buidling from scratch. As Azure services is managed, all you need to do is call them, configure them, and finally with few lines of code get the results immediately.
Ofcourse there are other things as well, much advanced approches (like Reranking or Semantic Search with Reranking) that is provided pretty efficiently by Azure, but that's for some other article.
In the next article, we will look at how to build a RAG in AWS, yet another managed cloud service. We will use Bedrock, Kendra and other tools for our RAG.
AI, Technology
9 个月Exciting to see how Azure Services are being leveraged to enhance RAG solutions! Looking forward to learning more about the setup and integration process. Great work!
NSV Mastermind | Enthusiast AI & ML | Architect Solutions AI & ML | AIOps / MLOps / DataOps | Innovator MLOps & DataOps for Web2 & Web3 Startup | NLP Aficionado | Unlocking the Power of AI for a Brighter Future??
1 年I'm looking forward to diving into this comprehensive guide on leveraging Azure Services for constructing a sophisticated RAG solution! ??
AI Educator | Built a 100K+ AI Community | Talk about AI, Tech, SaaS & Business Growth ( AI | ChatGPT | Career Coach | Marketing Pro)
1 年Sounds like an informative and detailed article, can't wait to read it!