登录查看更多内容

Advanced Retrieval Augmented Generation (RAG) with Reranking

Sri Laxmi

AI Product Manager | Generative AI | AI Products Builders Host| M.Sc at TUM

发布日期: 2024年4月12日

Full tutorial - https://www.youtube.com/watch?v=9f5rLt3Kc4I&t=8s

Welcome to a tutorial on building a questionnaire and answer application using Advanced RAG! In this application, users can ask questions on a wide range of topics, and the system will provide detailed, human-like responses by leveraging the Cohere model.

Application Overview

The core components of this application are:

Data Ingestion: We fetch a Wikipedia article related to the topic (in this case, Machine Learning) and split the article text into smaller chunks.
Text Embedding: Using Cohere's embed model, we convert the text chunks and the user's query into high-dimensional vector representations, enabling us to measure their similarity.
Retrieval: By calculating the cosine similarity between the query vector and the chunk vectors, we retrieve the most relevant chunks from a simple vector database.
Reranking: The initially retrieved chunks are then reranked using Cohere's rerank model, which further boosts the relevance of the top chunks to the specific query.
Response Generation: The top reranked chunks are fed into Cohere's command model, along with the original query and a preamble providing context. The model then generates a rich, coherent response to the query, drawing insights from the relevant chunks.

The Importance of Reranking

Reranking plays a crucial role in the Retrieval Augmented Generation (RAG) process. In a naive RAG approach, a large number of contexts may be retrieved, but not all of them are necessarily relevant to the question. Reranking allows for the reordering and filtering of documents, placing the relevant ones at the forefront, thereby enhancing the effectiveness of RAG.

As shown in Figure 1, the task of reranking is like an intelligent filter. When the retriever retrieves multiple contexts from the indexed collection, these contexts may have different relevance to the user's query. Some contexts may be very relevant (highlighted in red boxes in Figure 1), while others may only be slightly related or even unrelated (highlighted in green and blue boxes in Figure 1).

The task of reranking is to evaluate the relevance of these contexts and prioritize the ones that are most likely to provide accurate and relevant answers. This allows the LLM to prioritize these top-ranked contexts when generating answers, thereby improving the accuracy and quality of the response.

In simpler terms, reranking is like helping you choose the most relevant references from a pile of study materials during an open-book exam, so that you can answer the questions more efficiently and accurately.

import cohere, os
import wikipedia
from langchain_text_splitters import RecursiveCharacterTextSplitter

co = cohere.Client(os.environ['COHERE_API_KEY'])

# let's get the wikipedia article about Machine learning
article = wikipedia.page('Machinelearning')
text = article.content
print(f"The text has roughly {len(text.split())} words.")

# 1. Indexing and given a user query, retrieve the relevant chunks from the index

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=50,
    length_function=len,
    is_separator_regex=False,
)

chunks_ = text_splitter.create_documents([text])
chunks = [c.page_content for c in chunks_]
print(f"The text has been broken down in {len(chunks)} chunks.")

# Because the texts being embedded are the chunks we are searching over, we set the input type as search_doc
model="embed-english-v3.0"
response = co.embed(
    texts= chunks,
    model=model,
    input_type="search_document",
    embedding_types=['float']
)
embeddings = response.embeddings.float
print(f"We just computed {len(embeddings)} embeddings.")

# ### Store the embeddings in a vector database
# We use the simplest vector database ever: a python dictionary using `np.array()`.
import numpy as np
vector_database = {i: np.array(embedding) for i, embedding in enumerate(embeddings)}

query = "Give me a list of machine learning models mentioned in the page. Also give a brief description of each model"

# ### Embed the user question
# Because the text being embedded is the search query, we set the input type as search_query
response = co.embed(
    texts=[query],
    model=model,
    input_type="search_query",
    embedding_types=['float']
)
query_embedding = response.embeddings.float[0]
print("query_embedding: ", query_embedding)

# ### Retrieve the most relevant chunks from the vector database (cosine similarity)
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Calculate similarity between the user question & each chunk
similarities = [cosine_similarity(query_embedding, chunk) for chunk in embeddings]
print("similarity scores: ", similarities)

# Get indices of the top 10 most similar chunks
sorted_indices = np.argsort(similarities)[::-1]

# Keep only the top 10 indices
top_indices = sorted_indices[:10]
print("Here are the indices of the top 10 chunks after retrieval: ", top_indices)

# Retrieve the top 10 most similar chunks
top_chunks_after_retrieval = [chunks[i] for i in top_indices]
print("Here are the top 10 chunks after retrieval: ")
for t in top_chunks_after_retrieval:
    print("== " + t)

# 2. Rerank the chunks retrieved from the vector database
# We rerank the 10 chunks retrieved from the vector database. 
# Reranking boosts retrieval accuracy.
# Reranking lets us go from 10 chunks retrieved from the vector database, to the 3 most relevant chunks.

response = co.rerank(
    query=query,
    documents=top_chunks_after_retrieval,
    top_n=3,
    model="rerank-english-v2.0",
)

top_chunks_after_rerank = [result.document['text'] for result in response]
print("Here are the top 3 chunks after rerank: ")
for t in top_chunks_after_rerank:
    print("== " + t)

# 3. Generate the model final answer, given the retrieved and reranked chunks

# preamble containing instructions about the task and the desired style for the output.
preamble = """
## Task & Context
You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.

## Style Guide
Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.
"""

# retrieved documents
documents = [
    {"title": "chunk 0", "snippet": top_chunks_after_rerank[0]},
    {"title": "chunk 1", "snippet": top_chunks_after_rerank[1]},
    {"title": "chunk 2", "snippet": top_chunks_after_rerank[2]},
  ]

# get model response
response = co.chat(
  message=query,
  documents=documents,
  preamble=preamble,
  model="command-r",
  temperature=0.3
)

print("Final answer:")
print(response.text)

1. Import Required Libraries:

- import cohere, os: Import the Cohere API client and the OS module to access environment variables.

- import wikipedia: Import the Wikipedia library to retrieve the content of the "Machine Learning" Wikipedia article.

- from langchain_text_splitters import RecursiveCharacterTextSplitter: Import the RecursiveCharacterTextSplitter from the langchain_text_splitters library to split the text into smaller chunks.

2. Retrieve the Wikipedia Article:

- Create a Cohere API client using the cohere.Client() function and the COHERE_API_KEY environment variable.

- Retrieve the Wikipedia article on "Machine Learning" using the wikipedia.page() function and store the article content in the text variable.

- Print the approximate number of words in the article.

3. Split the Text into Chunks:

- Create a RecursiveCharacterTextSplitter instance with the following parameters:

- chunk_size=512: Set the maximum chunk size to 512 characters.

- chunk_overlap=50: Set the overlap between chunks to 50 characters.

- length_function=len: Use the len() function to determine the length of the text.

- is_separator_regex=False: Disable the use of a regular expression for splitting the text.

- Split the article text into chunks using the create_documents() method and store the chunk contents in the chunks list.

William W Collins 2 个月前

Knowledge Graphs in RAG: Enhancing AI with Structured…

Sanjiv Kumar Jha 1 个月前

RAG: Build Better AI Apps

SHASHANK GUNDA 2 个月前

- Print the number of chunks created.

4. Compute Embeddings for the Chunks:

- Set the Cohere model to "embed-english-v3.0" and the input type to "search_document".

- Use the co.embed() function to compute the embeddings for the chunks and store them in the embeddings variable.

- Print the number of computed embeddings.

5. Store the Embeddings in a Vector Database:

- Create a Python dictionary vector_database to store the embeddings, where the keys are the indices of the chunks, and the values are the corresponding embeddings.

6. Embed the User Query:

- Set the user's query to "Give me a list of machine learning models mentioned in the page. Also give a brief description of each model".

- Use the co.embed() function to compute the embedding for the user's query, setting the input type to "search_query".

- Print the computed query embedding.

7. Retrieve the Most Relevant Chunks:

- Implement a cosine_similarity() function to calculate the cosine similarity between the query embedding and each chunk embedding.

- Calculate the similarity scores between the query and each chunk, and store the scores in the similarities list.

- Sort the similarities list in descending order and get the indices of the top 10 most similar chunks.

- Retrieve the top 10 chunks based on the sorted indices and store them in the top_chunks_after_retrieval list.

- Print the indices and the contents of the top 10 chunks.

8. Rerank the Retrieved Chunks:

- Use the Cohere co.rerank() function to rerank the top 10 chunks, setting the top_n parameter to 3.

- Store the top 3 reranked chunks in the top_chunks_after_rerank list.

- Print the contents of the top 3 reranked chunks.

9. Generate the Final Answer:

- Prepare a preamble containing instructions about the task and the desired style for the output.

- Create a dictionary documents with the titles and snippets of the top 3 reranked chunks.

- Use the Cohere co.chat() function to generate the final answer, passing the user's query, the documents dictionary, and the preamble as input.

- Print the generated final answer.

AI & Product Newsletter

2,633 位关注者

?? Becca Williams

AI Growth Engineer @ Relevance AI & IOT maker | Community Advisor @ Build Club | People's Choice Award for having a convo with a robot panda on stage (a collab project) ????

6 个月

This is perfect timing, EXACTLY what I wanted to learn next RAG-wise. Thank you so much!

1 次回应

Sri Laxmi

AI Product Manager | Generative AI | AI Products Builders Host| M.Sc at TUM

6 个月

Full tutorial - https://www.youtube.com/watch?v=9f5rLt3Kc4I&t=8s

查看更多评论

要查看或添加评论，请登录

查看全部

Advanced Retrieval Augmented Generation (RAG) with Reranking

Sri Laxmi

AI Product Manager | Generative AI | AI Products Builders Host| M.Sc at TUM

Application Overview

The Importance of Reranking

领英推荐

AI & Product Newsletter

2,633 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

?? "Navigating the Nexus of Knowledge Graphs and AI: Illuminating Insights from PDFs" ??

Power of Vector Databases and its Evolution with AI & ML

Understanding and Managing Costs of Snowflake Cortex

Simplifying Coordinate-Based Sharding in Elasticsearch

Overview of Retrieval-Augmented Generation (RAG)

Synerise open-sourcing Cleora AI framework for ultra-fast embeddings in large graphs

GPT-4o's New Fine-Tuning Capabilities: An Introduction

Transforming Enterprise Data for AI & GPT LLMs: Challenges & Solutions

AI Simplified - Decoding the Jargon

Fixing (parts of) your Labeled Dataset

Application Overview

The Importance of Reranking

领英推荐

AI & Product Newsletter

2,633 位关注者

From Chatbots to AI Co-Pilots: Salesforce AI Product Leader talks about future of Generative AI co-pilots for Enterprise

2024年4月4日

Building a Text Summarization App with Open AI, Streamlit and LangChain

2024年3月24日

Build a Powerful RAG Chatbot with Cohere's Command-R

2024年3月17日

Build a Generative AI app with Claude 3 - The powerful LLM

2024年3月16日

Boost growth by picking the best product copy using AI - Just words

2024年3月12日

Build AI agents that work for you using Autogen - Full tutorial

2024年3月9日

How to build a RAG chatbot using Ollama - Serve LLMs locally

2024年3月8日

How to Get Into Y Combinator: Insider Tips for Nailing the YC Accelerator Application

2024年3月7日

Step-by-step guide on how to build AI agents using CrewAI

2024年3月2日

Step-by-Step Guide to Building AI Agents with AutoGen Studio 2.0 - Real-world use-case

2024年2月29日

社区洞察

其他会员也浏览了

?? "Navigating the Nexus of Knowledge Graphs and AI: Illuminating Insights from PDFs" ??

Power of Vector Databases and its Evolution with AI & ML

Understanding and Managing Costs of Snowflake Cortex

Simplifying Coordinate-Based Sharding in Elasticsearch

Overview of Retrieval-Augmented Generation (RAG)

Synerise open-sourcing Cleora AI framework for ultra-fast embeddings in large graphs

GPT-4o's New Fine-Tuning Capabilities: An Introduction

Transforming Enterprise Data for AI & GPT LLMs: Challenges & Solutions

AI Simplified - Decoding the Jargon

Fixing (parts of) your Labeled Dataset