登录查看更多内容

Retrieval-Augmented Generation (RAG) with LangChain: Refining the Future of AI Conversations

Rany ElHousieny, PhD???

Senior Software Engineering Manager (EX-Microsoft) | Generative AI Leader @ Clearwater Analytics | Generative AI, Conversational AI Solutions Architect

发布日期: 2024年7月1日

In the ever-evolving landscape of artificial intelligence, the quest for more intelligent and contextually aware conversational agents has led to the development of innovative approaches. Among these, Retrieval-Augmented Generation (RAG) stands out as a groundbreaking technique that significantly enhances the capabilities of AI models. By combining the strengths of retrieval-based methods and generative models, RAG offers a powerful solution for creating more informative, relevant, and coherent AI-driven interactions.

LangChain, a versatile and robust framework, is at the forefront of this transformation. It enables the seamless integration of retrieval and generation mechanisms, providing developers with the tools to build sophisticated AI applications. This article delves into the intricacies of RAG, explores how LangChain facilitates its implementation, and highlights the profound impact it has on the future of AI conversations. Whether you're an AI enthusiast, developer, or researcher, join us as we uncover how RAG with LangChain is refining the future of AI interactions, making them more dynamic, accurate, and user-centric.

Understanding RAG and Its Architecture

https://aws.amazon.com/what-is/retrieval-augmented-generation/

The RAG system addresses a key limitation of LLMs: their responses are only as current and accurate as the data they were trained on. RAG introduces an "open book" strategy, where LLMs can access and incorporate information beyond their training data to answer questions more accurately, much like consulting a reference book during an open-book exam.

RAG achieves this by implementing a two-step process:

1. Retrieval Phase: When prompted with a question, RAG employs a retrieval mechanism to fetch relevant documents or data snippets from a vast external source, such as a knowledge base or the internet.

2. Generation Phase: The retrieved information is then presented to the LLM, augmenting its original prompt. Thus informed, the LLM synthesizes this additional context to generate a more precise and relevant response.

Vector Databases and Embeddings

https://www.analyticsvidhya.com/blog/2023/09/retrieval-augmented-generation-rag-in-ai/

At the heart of RAG lies the vector database, a specialized repository that stores text in the form of mathematical vectors. This is crucial for the retrieval phase, where RAG converts user queries into vectors and matches them against this database to find the most relevant documents or data points. Embeddings, which are representations of text as vectors, encapsulate the semantic meaning and context, thus enabling RAG to discern relevance with a high degree of accuracy.

https://www.dhirubhai.net/pulse/understanding-word-embedding-nlp-using-sentence-elhousieny-phd%25E1%25B4%25AC%25E1%25B4%25AE%25E1%25B4%25B0-fcd2c

https://www.dhirubhai.net/pulse/embeddings-vector-stores-rany-elhousieny-phd%E1%B4%AC%E1%B4%AE%E1%B4%B0-qhrnc/

The Art of Similarity Matching

Similarity in RAG is computed using vector space models. When a query is converted into a vector, it's compared against the database of embeddings using similarity metrics like cosine similarity. The most similar vectors are retrieved as they represent the documents most relevant to the query.

RAG Python Implementation with ChromaDB, Ollama, Llama3, and LangChain

To practically implement RAG using Python, ChromaDB can serve as the vector database where embeddings are stored and retrieved. Llama3, a powerful LLM available on Ollama, can be the model of choice for response generation. Finally, LangChain, a library that aids in constructing RAG systems, can be used to manage the interaction between the retrieval and generation phases, providing a full-fledged RAG implementation.

Step 1: Load Documents

I will retrieve few pages from NVidia.

from langchain_community.document_loaders import WebBaseLoader

# List of URLs you want to load (We will crawl the entire site later)
urls = [
    "https://docs.nvidia.com",
    "https://docs.nvidia.com/cuda",
    "https://docs.nvidia.com/deeplearning",
    "https://docs.nvidia.com/gameworks"
    "https://docs.nvidia.com/cudnn",
    "https://docs.nvidia.com/tensorrt",
    
    
]

data = []

# Loop through each URL and load the page content
for url in urls:
    loader = WebBaseLoader(url)
    page = loader.load()
    page[0].page_content = page[0].page_content.replace('\n', '')
    data.extend(page)
    

pprint(data)

Step 2: Split Documents into Chunks

from langchain_text_splitters import RecursiveCharacterTextSplitter

def split_docs(data):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    return text_splitter.split_documents(data)

Step 3: Embedding

from langchain.embeddings import SentenceTransformerEmbeddings

embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

Step 4: Setting up ChromaDB as the VectorDB

# Add to ChromaDB vector store
from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(
    documents=all_splits,
    collection_name="rag-chroma",
    embedding=embeddings,
)
retriever = vectorstore.as_retriever()

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Weaviate
import weaviate
from weaviate.embedded import EmbeddedOptions



# Setup vector database
client = weaviate.Client(
  embedded_options = EmbeddedOptions()
)

# Populate vector database
vectorstore = Weaviate.from_documents(
    client = client,    
    documents = chunks,
    embedding = embeddings,
    by_text = False
)

# Define vectorstore as retriever to enable semantic search
retriever = vectorstore.as_retriever()

Step 5: Initializing the LLM Model

We will be using local Ollama as explained in the following article :

from langchain.llms import Ollama

llm = Ollama(
    model="llama3:latest",
    verbose=True,
    temperature=0,
    )

Salesforce 5 个月前

AI news #5: battle of embedding models

Avenga 2 个月前

Smarter AI, Better Decisions: Explore How RAG…

Aexonic 1 个月前

Step 6: RAG Prompt Template

To understand Prompt Templates and In-Context Learning, please review the following two articles:

from langchain.prompts import ChatPromptTemplate

# Prompt
template = """Answer the question only from the following context:
{context}

Question: {question}

"""
prompt = ChatPromptTemplate.from_template(template)

Step 7: RAG Pipeline

The following chain represents the RAG pipeline where data is seamlessly passed from one component to another, transforming it step by step: Starting from extracting relevant context and passing the original question, Formulating a well-structured prompt for model input, Generating a response from the language model, And finally parsing that response. Each component is carefully linked using the pipe (|) operator, which in this context, denotes the flow of data through successive transformations, making the process streamlined and maintainable.

from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser


# Chain
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

The line of code above is using the RAG pipeline, where data is passed through a sequence of processing steps. Each component in the pipeline transforms the data in some way before passing it on to the next component. Let's break down each part:

1. Chain Definition:

- `chain = (...)`: This sets up a variable named `chain` which is assigned the result of a series of operations connected by the pipe (`|`) operator. The pipe here serves as a way to pass the output of one function as the input to the next function.

2. Components:

- `{"context": retriever, "question": RunnablePassthrough()}`: Here, I am creating a dictionary object with two keys. `context` this is the context we get from the retriever object that is responsible for retrieving the context from the Vector DB accoring to the similarity with the question asked, and `question` is set to an instance of `RunnablePassthrough()`. This `RunnablePassthrough` is a class from LangChain_Core designed to pass through data without changing it,serving as a placeholder for the actual processing of the question. It is used here to pass the question through the pipeline without modifying it. This is useful when the question does not need processing or transformation before being used.

- `prompt`: This refers to an instance of ChatPromptTemplate initialized with a predefined template. The purpose of this component in the chain is to take the dictionary (with context and question) and format it according to the template specified earlier. The formatted string will typically combine the retrieved context and the unmodified question into a structured prompt ready for the model.

- `model_local`: This component is an instance of ChatOllama configured with a specific model identifier (ollama_llm). This step involves processing the formatted prompt from the previous step by feeding it to the local language model. The model will generate a response based on the input prompt, which incorporates both the context and the question.

- `StrOutputParser()`: This final step in the chain uses the StrOutputParser class to parse the raw string output from the local language model. The parser is responsible for cleaning up the model's response, extracting specific parts of the output, or converting it into a more usable format.

Step 8: Test

# Question
rag_chain.invoke("What is the NVIDIA CUDA Toolkit?")

'According to the context, the NVIDIA CUDA Toolkit is a comprehensive development environment for C and C++ developers building GPU-accelerated applications. It provides tools for developing, optimizing, and deploying applications on GPU-accelerated embedded systems, desktop workstations, enterprise data centers, cloud-based platforms and HPC supercomputers.'

Step 9: Creating a UI ChatPOD

from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.llms.ollama import Ollama
import ipywidgets as widgets
from IPython.display import display, clear_output, HTML

# Initialize the Ollama model
llm = Ollama(
    model="llama3:latest",
    
)

# Function to handle the input and display the response
def handle_query(sender):
    with output:
        clear_output(wait=True)  # Ensure the output is cleared only once ready to display new output
        print("Processing...")
        try:
            response = rag_chain.invoke(input_box.value)
            display(HTML(f"<div style='word-wrap: break-word; white-space: pre-wrap;'>Response: {response}</div>"))
        except Exception as e:
            print("An error occurred:", str(e))

# Create widgets for input and output
input_box = widgets.Text(description="Enter a query:")
button = widgets.Button(description="Submit Query")
output = widgets.Output()

# Set up the button's event to handle the query
button.on_click(handle_query)

# Display the widgets
display(input_box, button, output)

Conclusion

RAG stands at the frontier of AI’s conversational prowess, enabling systems to respond with accuracy and currency that was previously unattainable. By leveraging the latest information through external knowledge bases, RAG systems promise a future where chatbots and virtual assistants are not just helpers but knowledgeable consultants capable of providing verified and precise information. As the technology continues to mature, we can expect even more innovative applications across different domains, transforming how we interact with AI.

Additional Resources:

https://aws.amazon.com/what-is/retrieval-augmented-generation/

https://www.analyticsvidhya.com/blog/2023/09/retrieval-augmented-generation-rag-in-ai/

https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview

AI Solutions Architect

1,585 位关注者

Quinn Halstead

Student at WGU for BSCIA

4 个月

Check out how we utilize RAG with NeuralSeek a real world and working example. https://documentation.neuralseek.com/

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Retrieval-Augmented Generation (RAG) with LangChain: Refining the Future of AI Conversations

Rany ElHousieny, PhD???

Senior Software Engineering Manager (EX-Microsoft) | Generative AI Leader @ Clearwater Analytics | Generative AI, Conversational AI Solutions Architect

Understanding RAG and Its Architecture

RAG achieves this by implementing a two-step process:

Vector Databases and Embeddings

The Art of Similarity Matching

RAG Python Implementation with ChromaDB, Ollama, Llama3, and LangChain

Step 1: Load Documents

Step 2: Split Documents into Chunks

Step 3: Embedding

Step 4: Setting up ChromaDB as the VectorDB

Step 5: Initializing the LLM Model

领英推荐

Step 6: RAG Prompt Template

Step 7: RAG Pipeline

Step 8: Test

Step 9: Creating a UI ChatPOD

Conclusion

Additional Resources:

AI Solutions Architect

1,585 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

The AI Data Odyssey: Navigating the Synthetic Seas

GenAI Weekly — Edition 27

How LLMs are Shaping Enterprise-Scale Applications

TECH-EXTRA: There Is No Finish Line.

No Connection, No Problem: AI Solutions with GPT4All and KNIME

Is AI Doomed to Collapse Under Its Own Weight? Unpacking the ‘Dead Data’ Crisis

How AI integrates into our data design process

Analysis and Strategy to Realize the Orion AI Model

Notable Recent AI News, Articles, and Papers for Tuesday, July 23, 2024

Q&A: The increasing difficulty of detecting AI- versus human-generated text

Understanding RAG and Its Architecture

RAG achieves this by implementing a two-step process:

Vector Databases and Embeddings

The Art of Similarity Matching

RAG Python Implementation with ChromaDB, Ollama, Llama3, and LangChain

Step 1: Load Documents

Step 2: Split Documents into Chunks

Step 3: Embedding

Step 4: Setting up ChromaDB as the VectorDB

Step 5: Initializing the LLM Model

领英推荐

Step 6: RAG Prompt Template

Step 7: RAG Pipeline

Step 8: Test

Step 9: Creating a UI ChatPOD

Conclusion

Additional Resources:

AI Solutions Architect

1,585 位关注者

Enabling Titan in AWS Bedrock and Calling it from a Python Notebook

2024年11月23日

Unlocking AI Potential with OpenAI APIs

2024年11月19日

Clearwater Analytics: Leading the AI Revolution in Finance with Multi-Agent Systems

2024年10月4日

Understanding the Python requests Library

2024年10月4日

Building LangChain ReAct Agents with create_json_chat_agent

2024年9月29日

Exploring LangChain's AgentExecutor

2024年9月29日

Llama 3.2: A New Era in AI Model Efficiency

2024年9月27日

Galileo Protect with LangChain– Real-Time AI Hallucination Firewall

2024年9月26日

Creating LangChain Agents with LCEL using the Pipe Operator and Solar LLM: A Simple Guide

2024年9月26日

Handling "Agent stopped due to iteration limit or time limit." in LangChain: Avoiding Endless Loops in CoALA Agents

2024年9月25日

社区洞察

其他会员也浏览了

The AI Data Odyssey: Navigating the Synthetic Seas

GenAI Weekly — Edition 27

How LLMs are Shaping Enterprise-Scale Applications

TECH-EXTRA: There Is No Finish Line.

No Connection, No Problem: AI Solutions with GPT4All and KNIME

Is AI Doomed to Collapse Under Its Own Weight? Unpacking the ‘Dead Data’ Crisis

How AI integrates into our data design process

Analysis and Strategy to Realize the Orion AI Model

Notable Recent AI News, Articles, and Papers for Tuesday, July 23, 2024

Q&A: The increasing difficulty of detecting AI- versus human-generated text