DREAM: Distributed RAG Experimentation Framework
A blueprint for distributed RAG experimentation using Ray, LlamaIndex, Ragas, MLFlow & MinIO on Kubernetes
Contents
1. ?? What is DREAM?
2. ?? Code Walkthrough
3. ?? Conclusion
1. ?? What is DREAM?
a. ?? What is it, really?
Given the myriad of options for LLMs, embedding models, retrieval methods, re-ranking methods and so on, it can be challenging to determine which combination will work best for your usecase. Who has the time to explore each combination one by one?
So, Distributed RAG Experimentation Framework (DREAM) is a blueprint, comprising of a kubernetes native architecture and sample code, to demonstrate how Retrieval Augmented Generation (RAG) experiments, evaluation and tracking can be conducted in a distributed manner using Ray, LlamaIndex, Ragas, MLFlow and MinIO on Kubernetes.
By setting up the necessary K8s tooling and running the experimentation, evaluation and tracking in a distributed manner, we ultimately want to be able to compare and contrast the different combinations of RAG parameters and pick the one that works best for our usecase.
b. ??? Architecture
As shown in the architecture diagram above, DREAM uses the following technologies:
For installing all these components, you can follow the steps outlined in the installation guide. You might notice that DREAM is part of a larger project I'm calling GOKU (GenAIOps on Kubernetes), which is coming soon!
c. ?? Show me the code!
Here you go: DREAM Github :)
2. ?? Code Walkthrough
a. ?? Preparing Unstructured Data
This steps in this noteboook are quite straightforward:
b. ?? Distributed Generation of Golden Dataset
This is where the fun begins!
c. ?? Distributed Experimentation & Evaluation
This is about to get a little complicated, so here's the overall workflow visualised:
领英推荐
Before we get to the juicy bits, let me describe the search space and evaluation metrics. In the sample code, our search space spans over 3x RAG methods, 2x LLMs and 2x embedding models. We use 3 RAG methods native to LlamaIndex - chunks with overlap, sentence window retrieval and hierarchical automerging retrieval. We use OpenAI 's gpt-3.5-turbo and gpt-4 as our LLMs, with text-embedding-3-small and text-embedding-3-large as our embedding models. For evaluation, we use the Ragas framework's faithfulness, answer_relevancy, context_precision, context_recall, answer_correctness and answer_similarity as metrics. To understand the RAG methods and ragas metrics in-depth, you can checkout my previous article on Advanced RAG:
Cue Ray Tune!
d. ?? Experiment Tracking
Finally, we leverage the amazing experiment tracking capability of MLflow to record experiment results, establish lineage with the golden dataset and visualise experiment results. Here's a flurry of screenshots that speak for themselves!
3. ?? Conclusion
a. ?? In a nutshell
In this article, we took a look at DREAM, which is a blueprint for tooling and code that demonstrates how distributed RAG experimentation, evaluation and tracking can be done using open-source technologies including Ray, LlamaIndex, Ragas, MLFlow & MinIO on Kubernetes.
b. ?? What's next?
This is a bad first draft of what can be done in terms of the extent to which the distributive nature of the experimentation exercise can be optimised and exploited. For instance, it might make sense to use Ray Data for reading and writing the csv files. We can take things a step further and use distributed calls to the embedding model to create the VectorStoreIndex! I hope you use this as a building block and go nuts with optimization in your own projects :)
Another interesting idea to consider is how to turn this into a re-usable no-code/low-code workflow. Notice how the steps running in the Jupyter notebooks can be neatly organised into a linear DAG. If we fix the parameters of the RAG search space, we could package up steps in an Argo Workflow and trigger the distributed experiment, evaluation and tracking as low-code/no-code pipeline, on any arbitrary unstructured data in S3!
References
Co-founder at Anyscale
9 个月Awesome to see this!
Gen AI | Data | LLMs - RAG | [email protected]
10 个月Great framework setup for Distributed RAG Aishwarya Prabhat !!
Senior System Reliability Engineer / Platform Engineer
10 个月First time reading about Ray. This may be oversimplification but it sounds like AWS autoscaling....
I help companies resuscitate dead leads and sell using AI ?????????????? #copywriting #emailmarketing #coldemail #content #databasereactivation
10 个月Impressive setup for Distributed RAG Experimentation Framework! It's a game-changer in the tech world.
Co-founder/CEO @ LlamaIndex
10 个月great stack for anyone doing LLM app dev, thanks for sharing ??