What is RAG and Why Should You Care?

What is RAG and Why Should You Care?

RAG, or Retrieval-Augmented Generation, merges pre-trained language models with a retrieval system, enabling access to external knowledge sources like the internet or documents for more accurate and reliable responses.?

Imagine giving a super-smart LLM a buddy who can fetch helpful information from the internet and other sources. This buddy, combined with the LLM language skills, makes sure it gives you really accurate and reliable answers. And the cool part is, they can keep learning and getting better without starting from scratch each time! It's like having a clever friend who always has the right facts at hand.?

The operational framework of RAG involves two primary components: a retrieval module and a generation module.?

Steps in Crafting a Simple RAG with Chroma: ???

Let's break down the five essential steps in creating an effective Retrieval Augmented Generation system.?

Data preparation: ?

Loading relevant data and break it into smaller chunks.?

Adjusting chunk size for a balance between granularity and noise with chunk_size.

loader = DirectoryLoader('./articles/', glob="./*.txt", loader_cls=TextLoader)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)?        

chunk_overlap is a parameter that determines how much adjacent chunks of text share in common when using a fixed-size chunking.

Embedding Functions:?Embedding functions play a crucial role in the transformation of data chunks into numerical vectors. embedding functions can be accessed from the Embedding Function class. Additionally, for a more tailored approach, users have the flexibility to create custom embedding functions.

Embedding functions serve a dual purpose in the retrieval augmented generation process:

  • To convert data chunks into embedding?before storing in vector database.
  • To convert query into embedding for similarity search.

from chromadb.utils import embedding_functions
from openai import OpenAI
OPENAI_API_KEY= os.getenv("OPENAI_API_KEY")
embedding_function = embedding_functions.OpenAIEmbeddingFunction(
            api_key=OPENAI_API_KEY,
            model_name="text-embedding-ada-002"
            )        

Vector Database: Stores pre-computed embeddings of all documents for quick retrieval. The embedding function, along with the associated data, is supplied to a Vector Database instance.

chroma_client = chromadb.Client()
vector_store = chroma_client.get_or_create_collection(name="Articles",
                                                  embedding_function=embedding_function)
vector_store.add(ids=id_list, documents=text_content_list,metadatas=metadata_list)        


Context retrieval: Retrieve context of query using query vector to search vector database and identify relevant documents based on vector similarity to the query. ?

results = vector_store.query(
        query_texts=query,
        n_results=2
    )        

Query to LLM:?Retrieved documents shape the generation module's context, influencing the final output. This step involves feeding the enriched prompt, informed by the retrieved context, into the LLM for the final output generation.

import google.generativeai as genai
message = [{"role": "user",
"parts": [
          f"We have provided context information below. \n"
          f"---------------------\n"
          f"{results}"
          f"\n---------------------\n"
          f"Given this information, please answer the question:{query}"
                ]
          }]
          model = genai.GenerativeModel('gemini-pro')
          response = model.generate_content(message)
          response_txt = response.text        

LLM Challenges and how RAG solves them:?

Large Language Models (LLMs) are like super-smart computers that write human-like text, but they have some problems. Here's Retrieval-Augmented Generation (RAG) helps:?

  • Hallucination:?

  • LLMs: Sometimes, they imagine things that aren't real facts.?
  • RAG: Stops this by bringing in real information from outside, to keep it honest.??

  • Costly and time-consuming:?

  • LLMs: It's expensive and takes a long time to make them work for specific jobs.?
  • RAG: Saves time and money by using smart helper to find the right info without a ton of work.?

  • Text repetition:?

  • LLMs: They often write text that's a bit dull and repeats a lot.?
  • RAG: Makes things interesting by using different sources, so the text is more varied and exciting.?

  • Info can be outdated:?

  • LLMs: Sometimes, they're not up to date with the latest news.??
  • RAG: Solves this by grabbing the latest info from the internet, making sure everything is current and correct.?

Nicolò Magnanini

CEO and co-founder at Pigro

9 个月

Nice! Just one point: there are today more sophisticated chunking solutions that split documents based on the original document layout and content semantics, like https://preprocess.co . This way you can have optimal chunks of text for each document type.

回复
Wajeeha Ayaz

IT Student | Media lead GDSC | Web development | cyber security | cloud computing | GenAI | web3

9 个月

Very helpful- looking forward for more ????

John Carlo G. Cardenas ??

Done-for-You Client Acquisition Engine for Coaches & Consultants using Email & Linkedin ?? ? 5+ New Clients GUARANTEED in 90 Days ? LinkedIn? Selling Expert

10 个月

Looking forward to diving into your article on RAG, it sounds like a game-changer! ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了