登录查看更多内容

Evaluating Retrieval-Augmented Generation (RAG) Applications with RAGAS and LangChain

Emmanuel Ezeokeke

I help you achieve impactful AI solutions through innovative AI models | AI Engineer | AI Agent Developer | NLP | Deep Learning | LLM Development → Model Optimization → Prompt Engineering → AI agent → Cloud Deployment

发布日期: 2024年6月20日

Retrieval-Augmented Generation (RAG) is a powerful approach for enhancing Large Language Models (LLMs) with external knowledge, leading to more accurate and informative responses. However, evaluating the performance of these RAG pipelines is a multifaceted challenge. The ragas framework, in combination with LangChain, provides a robust solution to this challenge, offering a structured and comprehensive approach to evaluating both retrieval and generation components of your RAG application.

The Need for Rigorous Evaluation:

Developing a proof-of-concept RAG application might be relatively straightforward, but ensuring its production readiness is a different ballgame. This is where evaluation becomes paramount. By meticulously assessing your RAG pipeline, you gain valuable insights into its strengths and weaknesses, enabling targeted improvements for optimal performance.

Ragas: Your RAG Evaluation Toolkit

Ragas is an evaluation framework designed to assess the performance of RAG pipelines on a component level. It offers a variety of metrics that gauge the quality of both the retrieval and generation processes, providing a holistic view of your application's capabilities.

Ragas: Your RAG Evaluation Toolkit

Key RAGAS Metrics:

Context Precision: This metric measures the signal-to-noise ratio of the retrieved context, ensuring that the information fetched is relevant to the query.
Context Recall: This metric verifies whether all necessary information required to answer the query has been retrieved, ensuring the completeness of the retrieved context.
Faithfulness: This metric evaluates the factual accuracy of the generated answer against the retrieved context, ensuring that the generated response aligns with the factual information provided.
Answer Relevancy: This metric determines the relevance of the generated answer to the question, guaranteeing that the model's response directly addresses the user's query.
Answer Correctness: This metric, often relying on human-annotated ground truth labels, measures the factual accuracy of the generated answer against the ideal response, providing a direct measure of the model's accuracy.
Answer Similarity: This metric compares the semantic similarity between the generated answer and the ground truth, assessing how closely the model's response aligns with the expected answer.

Synthetic Data Generation with RAGAS

One of the most powerful features of RAGAS is its ability to generate synthetic evaluation datasets. This streamlines the evaluation process by automatically creating diverse question-answer pairs along with relevant context snippets and corresponding ground truths. This not only saves time and resources but also ensures a broader range of test cases for more robust evaluation.

I created an example of how to generate a dataset, build an RAG app with your data set, and evaluate the RAG app using RAGAS:

1) Install your Framework and libraries

2) Import your Openai API key

3) Load your data, for me I am using a markdown file

4) Split your data using the chunk size and chunk overlap method

5) Select any embedding model of your choice

6) Choice any vector store you want, for me I will be using FAISS from Meta

7) Create a retriever

领英推荐

Evaluating LLM and RAG Systems

Sanjay Basu PhD 3 个月前

SeaKR: Self-aware Knowledge Retrieval for Adaptive…

Vlad Bogolin 3 个月前

Exploring Self-Reasoning in Retrieval-Augmented…

Robyn Le Sueur 2 个月前

8) Create a prompt template

9) Setting Up the Basic QA Chain, now we can instantiate the basic RAG chain!

10) Time to create our dataset from our document, I will use gpt 3.5 to generate our dataset and gpt 4o to criticize and review our dataset. RAGAS generates our dataset which includes the ground truth, context, question, and evolution type

11) We will use a more powerful retriever which is the Multi query retriever from LangChain

12) First, I will create a chain to stuff the documents into my context, create the retrieval chain, and test it

13) I will now create a pipeline to collect the pipeline's contexts and answers and convert it into a dataset

14) I will not evaluate it on the metrics available on RAGAS. I chose the faithfulness, answer relevancy, context recall, context precision, and answer correctness metrics.

15) These are the results I got from my RAGAS metrics: faithfulness: 0.8053, answer_relevancy: 0.8226, context_recall: 0.9388, context_precision: 0.8830, answer_correctness: 0.8726. You can also improve this by using other retriever methods from LangChain like Ensemble Retriever and Parent Document Retriever and compare their result to see which is better.

You can also check out my GitHub repo to see the full code above and how I created a UI for the RAG app I evaluated using ragas : https://github.com/Emarhnuel/Insurance_Chatbot_evaluation/tree/main

If you are interested in topics relating to:

- Python

- AI agents

- LLMs/AI Engineering

Connect and follow me for more content

Himanshu Bamoria

要查看或添加评论，请登录

查看全部

Evaluating Retrieval-Augmented Generation (RAG) Applications with RAGAS and LangChain

Emmanuel Ezeokeke

I help you achieve impactful AI solutions through innovative AI models | AI Engineer | AI Agent Developer | NLP | Deep Learning | LLM Development → Model Optimization → Prompt Engineering → AI agent → Cloud Deployment

The Need for Rigorous Evaluation:

Ragas: Your RAG Evaluation Toolkit

Ragas: Your RAG Evaluation Toolkit

Synthetic Data Generation with RAGAS

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Exploring Self-Reasoning in Retrieval-Augmented Generation (RAG)

Agentic RAG: A Reasoning Revolution for Information Retrieval

Gentle Introduction to Retrieval Augmented Generation

Optimization of Language Models (LLM) with the "Retrieval Augmented Generation" (RAG) Technique: Methods, Applications, and Challenges

Crafting Effective Prompts: A Guide for Prompt Engineers to Get Data-Driven Responses from Large Language Models

RAG & GraphRAG

RAG to Riches: Elevating LLMs from General to Relevant

Dirty RAG is not just for the kitchen

Retrieval Augmented Generation - Connecting LLMs with your Knowledge Base

Summarizing Documents with LLMs: A Comprehensive Guide

The Need for Rigorous Evaluation:

Ragas: Your RAG Evaluation Toolkit

Ragas: Your RAG Evaluation Toolkit

Synthetic Data Generation with RAGAS

领英推荐

GraphRAG in Action: Its Difference From Ordinary RAGs and Its Benefits

2024年7月4日

AI Agents vs. AI Models: Why Agents Take the Lead

2024年5月17日

How to run LLama3 locally on your Computer

2024年4月25日

AI and Robotics Developments for Last Week

2024年4月22日

ORPO vs. RLHF: Which Path to Choose for Aligning Multimodal LLMs?

2024年4月16日

Harnessing Groq's Power for Faith-Based LLM Chatbot: A Bible Assistant Application

2024年3月30日

Advantages of ChatGPT to your Business

2023年1月26日

Top 5 Ways To Improve Yourself As A Digital Marketer

2022年6月10日

UNDERSTANDING GOOGLE ADWORD BIDDING FOR ECOMMERCE STORES

2022年3月25日

社区洞察

其他会员也浏览了

Exploring Self-Reasoning in Retrieval-Augmented Generation (RAG)

Agentic RAG: A Reasoning Revolution for Information Retrieval

Gentle Introduction to Retrieval Augmented Generation

Optimization of Language Models (LLM) with the "Retrieval Augmented Generation" (RAG) Technique: Methods, Applications, and Challenges

Crafting Effective Prompts: A Guide for Prompt Engineers to Get Data-Driven Responses from Large Language Models

RAG & GraphRAG

RAG to Riches: Elevating LLMs from General to Relevant

Dirty RAG is not just for the kitchen

Retrieval Augmented Generation - Connecting LLMs with your Knowledge Base

Summarizing Documents with LLMs: A Comprehensive Guide