登录查看更多内容

RAG (Retrieval Augmented Generation) 101

Udara Nilupul

Machine Learning Engineer @ Ascentic | Former MLE @ Exedee | MSc. in DS and AI (R) - UOM |BSc. (Hons) in Engineering - UOJ

发布日期: 2024年2月28日

Large Language Models (LLMs) have been a huge trend on building AI based solutions recently. This is obvious because they have been trained on a large text corpus covering almost all areas that a human can possibly cover so they have a wide range of knowledge base. But sometimes they tend to provide made-up facts,simply lies or complete nonsense when they encounter something they don't know. This is due to many factors. Some of them are,

Source - Reference divergence
Exploitation through jailbreak prompts
Reliance on incomplete or contradictory datasets
overfitting and lack of novelty
Guesswork from Vague or insufficiently detailed prompts

This phenomena is often called as "Hallucination" and this have been the most common issue with almost all LLMs when they are used to a downstream task. For an example ChatGPT and Google Bard (now Gemini) has recorded 10% and 29% of Hallucination rate respectively in the PHM Knowledge examination.

There are several ways to encounter hallucination and following are some common methods.

Contextual prompt engineering
Domain Adaptation and Augmentation
Adjusting model parameters or incorporating additional parameters

RAG or Retrieval Augmented Generation lies under the Domain adaptation and augmentation and it is a proved and sustainable way to encounter the hallucination issue while using a LLM for a downstream task.

The concept of RAG is pretty simple. The LLM is simply incorporated/ connected to your specific dataset. So LLM's parameters are kept intact but the knowledge base connection allows the LLM to refer them and provide better results, reducing the vulnerability towards hallucination.

How does this happen ? In a RAG pipeline the LLM dynamically incorporate the KB(Knowledge Base) data during the generation process. This is done by allowing the model to access and utilize the data in the KB in real-time without altering it. So the model's results will be more contextually relevant to the request.

In a RAG pipeline, the reference data/ KB data initially should be "indexed". Indexing in the sense is preparing the data so they would be compatible for querying. So when a user query for a specific data-point the index should be able to filter down the most contextually relevant data. Then the LLM uses the filtered data along with user query and an instructive prompt (often called as the system prompt) to provide the response.

Creating a RAG pipeline involves few steps.

Loading the data : Here the data that should be used as the knowledge base will be loaded and stored. The data type is not restricted to a single type or format. It could be a text, PDF website, image , audio etc.
Indexing : Most of the mentioned data is unstructured, so the data should be structured which allows them to be compatible with querying. Almost every time the data is vectorized; which means the data will be converted in to a vector. Most of the cases Embedding models are used for this. Embedding models have the unique ability to extract the contextual information from the data and represent it in vector form, which is quite remarkable. But the ability to represent contextual information in vector form merely depends on the model.
Storing : Once the data is indexed, they should be stored in somewhere else the data should be re-indexed again and again. Infamous Vector Databases or Indexes are used for this. These VDBs are more like NoSQL databases but tailored for vector operations. So along with some metadata, the embedding vectors will be saved here.
Querying : For a given user input , the most contextually similar/relative reference data point or points will be searched from the index. Initially the user input will be also indexed using the same embedding model and the index will find the most relevant vectors for the vectorized query from its vector space using a similarity metric. Some of the most used similarity metrics are cosine, euclidean and hamming (Yeah they are not new! some old school vector distance metrics).
Evaluation : Here the LLM's response is evaluated for its effectiveness and relevancy with the user input. This step will give a good intuition about the Pipeline overall performance. Mainly two aspects of the RAG will be evaluated here; Retriever and the LLM itself. Retrieval Evaluation and Response evaluation are the two respective methods for it.

领英推荐

RAG Techniques Every AI/ML/Data Engineer Should Know!

Pavan Belagatti 6 个月前

Almost Timely News: ??? Small Language Models and…

Christopher Penn 5 个月前

?? All You Need to Know About Small Language Models

Pascal Biese 4 个月前

Myth-buster : Not just for text data, RAG can be implemented for other data types such as audio and images also. All you need an embedding model that supports those formats and LLM which has multi-modality features.

Following are some tools/tech-stack that can be used to implement RAG pipeline for your needs.

Frameworks : LlamaIndex and LangChain
Indexes/ Vector Databases : Qdrant , Weaviate , Pinecone , Chroma , FAISS etc..
LLMs : OpenAI GPTs, 谷歌 Gemini, PaLM., Anthropic Claude, Mistral AI Mistral, Meta Llama,
Embeddings : OpenAI text embedding models/CLIP(image embeddings), Hugging Face models(bge models), Meta ImageBind(Multi modal capabilites)

Following is a more advanced multi-modal RAG pipeline which allows users to query both text and images.

Refernces :

Hallucinations in LLMs: What You Need to Know Before Integration https://masterofcode.com/blog/hallucinations-in-llms-what-you-need-to-know-before-integration
Evaluate RAG with LlamaIndex : OpenAI cookbook
Multimodal RAG pipeline with LlamaIndex and Neo4j : Tomaz Bratanic

要查看或添加评论，请登录

Udara Nilupul的更多文章

Built to Hallucinate: The Unbreakable Limitation of LLMs

2024年11月15日

Built to Hallucinate: The Unbreakable Limitation of LLMs

Have you ever encountered your LLM based chat assistant providing convincible and well structured "Lies" for you ?? I'm…

3 条评论
Intentional Overfitting? Discover the Wild Side of ML

2024年11月3日

Intentional Overfitting? Discover the Wild Side of ML

Overfitting, is often seen as a taboo in machine learning where it sabotages a ML model's prediction performance on…

5 条评论
ML Optimizers in Action: Tuning Algorithms for Peak Performance

2024年8月16日

ML Optimizers in Action: Tuning Algorithms for Peak Performance

In the realm of Machine Learning, optimization algorithms play a crucial role in the training process. These algorithms…
Transformer vs Mamba pt.2 : Sneak peak to Mamba's Architecture

2024年8月14日

Transformer vs Mamba pt.2 : Sneak peak to Mamba's Architecture

In this Article I'm going to dig deeper into Mamba's architecture and contrast it with conventional Transformer…

4 条评论
AI's New Apex Predator: Transformers vs Mamba (Part 1)

2024年7月2日

AI's New Apex Predator: Transformers vs Mamba (Part 1)

Transformer model In 2017, a groundbreaking paper titled "Attention is All You Need" introduced the Transformer…

1 条评论
Balancing the Scales : Handling Class Imbalance

2024年1月20日

Balancing the Scales : Handling Class Imbalance

Class imbalance is one of the main challenges when it comes to classification problems in Machine Learning. In simple…

2 条评论
Vanishing Points: Mastering the Art of Handling Missing Values in Analytics

2024年1月11日

Vanishing Points: Mastering the Art of Handling Missing Values in Analytics

Data Scientists have to deal with many challenges in their job. But without a doubt, their arch nemesis is the missing…
Data Drift/Shift: A Nemesis in Production Machine Learning

2023年12月26日

Data Drift/Shift: A Nemesis in Production Machine Learning

In the field of production machine learning, practitioners face various challenges. While some can be managed by human…

4 条评论

See all articles

RAG (Retrieval Augmented Generation) 101

Udara Nilupul

Machine Learning Engineer @ Ascentic | Former MLE @ Exedee | MSc. in DS and AI (R) - UOM |BSc. (Hons) in Engineering - UOJ

领英推荐

Udara Nilupul的更多文章

社区洞察

其他会员也浏览了

?? 3 Ways to Efficient AI

?????? LLMs Opening Their Inner Eyes

LLM Pulse - September 16, 2024

Chain of Draft (CoD): A Concise Reasoning Paradigm for Efficient LLMs

Retrieval-Augmented Generation (RAG) and Agentic RAG

What Are LLM Hallucinations and How to Avoid Them?

A Primer on Agentic Systems

Emergence of Small Language Models

A Business Leader's Guide to Language Models: Making Informed AI Decisions

Transformer Architectures for Dummies - Part 2 (Decoder Only Architectures)

领英推荐

Udara Nilupul的更多文章

Built to Hallucinate: The Unbreakable Limitation of LLMs

Intentional Overfitting? Discover the Wild Side of ML

ML Optimizers in Action: Tuning Algorithms for Peak Performance

Transformer vs Mamba pt.2 : Sneak peak to Mamba's Architecture

AI's New Apex Predator: Transformers vs Mamba (Part 1)

Balancing the Scales : Handling Class Imbalance

Vanishing Points: Mastering the Art of Handling Missing Values in Analytics

Data Drift/Shift: A Nemesis in Production Machine Learning

社区洞察

其他会员也浏览了

?? 3 Ways to Efficient AI

?????? LLMs Opening Their Inner Eyes

LLM Pulse - September 16, 2024

Chain of Draft (CoD): A Concise Reasoning Paradigm for Efficient LLMs

Retrieval-Augmented Generation (RAG) and Agentic RAG

What Are LLM Hallucinations and How to Avoid Them?

A Primer on Agentic Systems

Emergence of Small Language Models

A Business Leader's Guide to Language Models: Making Informed AI Decisions

Transformer Architectures for Dummies - Part 2 (Decoder Only Architectures)