High Fidelity RAG - Understanding Precision
Complicated AI Subjects in Simple Terms Series
The Basics
(I'll keep this brief as I assume most of you are aware with the concept): Retrieval Augmented Generation (RAG) is a process that uses stored data to assist AI in answering domain specific questions that can be proprietary or targeted for a business case. The data is stored in Vectors in a database to aid in similarity search and is passed to the AI as "Context" that is designed to answer a user's questions.
Understanding Fidelity
So, there are a few pieces here that can contribute to what I call RAG Data Fidelity:
To understand fidelity, you have to understand how this data gets stored in the vector database. To ingest these documents into the database, you use an embeddings model that has its own fidelity. When I built our internal Hyland Software AI Bot called Romanzo, I was on the Ada V2 embeddings model by OpenAI. This wasn't naivety, it was the benchmark back in November of last year. The embeddings model creates a large vector of numbers that contain 1,536 32-bit integers with the data passed to it. So, that is the fidelity of the embeddings model or precision of how it "encodes" the data it is sent. Cool, so now you understand fidelity, right? Nope. Next concept.
Side note: I am moving my entire 1.5 million row vector database to the latest text-embedding-3-large embeddings model over the next coming weeks... I know fidelity gains when I see it!
We still have 1 and 3 from above to discuss! Documents and chunk size. To embed your documents into this array of numbers, you have to break them up first. There is a myriad of "chunking strategies" that I think kind of get in the way sometimes for an effective system. I am not going to dive into that and focus on some core concepts. What I have done with chunking is used a predefined and naive (there is that word again) approach to my chunking. I am breaking my documents up into 400 "word" chunks with overlap. The overlap ensures that if I break up a paragraph at the end of the 400, I capture its entire context in most cases.
领英推荐
Now we are getting somewhere. We know how many numbers are in the vectors and we know how much data is in each "embedded chunk". This is where I think Fidelity shows up best. When you embed a chunk of data from your document, it is represented as the vector size. So... roughly 400 words gets represented in 1,536 numbers. Are you seeing it now? If I raise the chunk size, it loses fidelity! Precision is key here. Some people chunk the data in varying lengths. This is a debatable and sometimes good process. But what it does is makes the fidelity of each individual chunk vary. I wouldn't want to shove the concept of a 1000-word page into the same size vector space as 400 words. It obviously will lose precision.
Say you chunk by page or paragraph; those constantly vary in length. A long paragraph is less precisely represented because you HAVE to use the same vector size as your database column and the embeddings model allows. Phew, finally mentioned the DB. Vector search in the database only allows a rigid number of vectors. This enables similarity search. Not only does the embedding model create a standard array of numbers, but the database also has that limitation as well.
To Summarize My Position
Obligatory Counter Point
Technical Leader
4 个月A podcast about my research. https://soundcloud.com/gabriel-keith-870911066/high-fidelity-rag-unlocking-precision?si=48ef95ad8e5140f38ae903a1c2bc1a1b&utm_source=clipboard&utm_medium=text&utm_campaign=social_sharing