登录查看更多内容

Optimizing retrievers for AI

Rahul Mathur

Engineering Leader - ML & AI @ Adobe | IIM-C Alumni

发布日期: 2025年2月3日

Language models are available in different weights and forms but they are still incapable to understand your private and continuously growing data. Thus, despite openAI's GPT & now Deepseek which are close to human models, it is difficult to solve your business use cases without retrievers that pulls meaningful data for generation model to understand. Yes, you can fine tune, but again it will be limited to then available knowledge. That is the sole reason RAG (Retrieval Augment Generation) is most searchable term on the internet.

RAG searches as per google trend:

User's searching 'LLM fine tuning' or 'fine tune':

Fine tuning model, searches in last 3 months

Are you experiencing below struggle?

I am using a SOTA model that I have thoroughly tested, and it delivers the expected results when combined with the right prompts and knowledge. However, at runtime, the retriever is not selecting the most relevant content, leading to suboptimal inputs for the generation model. As a result, the model struggles to generate accurate and contextually appropriate outputs. My retriever is not providing the expected quality!

If you are semantically aligned with above statement then let's understand further.

Disclaimer : Don't use your embedding models(retrievers) yet to align semantically with above statement :) Let's first understand them.

Understanding retriever

Retrieval-Augmented Generation (RAG) is a technique that combines retrieval-based search with generative AI to produce more accurate, contextual, and updated responses. Unlike standard LLMs, which rely only on pre-trained knowledge, RAG dynamically retrieves external documents to improve its answers.

If you will do 1-gram on above 2 paras, you will see retrieval is occurring 2 times and both TF (Term Frequency) and KL divergence (relative importance on topic) adds weightage on the topic. Hence, retrievers are important to understand as per this mathematical evaluation, well other than obvious reasons :)

Please note that, as shared, retrieval can be performed from any external source that provides context to the topic in question. However, the most widely used and efficient method is semantic search or vector search. Below diagram depicts the vector search based retrieval flow.

The above diagram classifies the segregation of retrievers or retriever-based generation flow and the offline knowledge base creation within vector stores. Offline knowledge embedding is a pre-processing step that requires:

crawlers, & scrappers, document loaders, Embeddings & vector stores.

On the other hand, a retrieval is a post processing knowledge selection method on top of vector store. A retriever is a combination of following :

Embedding Model : Converts textual data into numerical vector representations. Embedding models operate in an n-dimensional vector space, where each dimension captures a specific characteristic of the data. Based on training, the model embeds a given query and searches for the most relevant vectors stored in the database using the same embedding model.
Similarity search Algorithm : These algorithms are required to find the closest matching vectors to the query using similarity metrics. There are many algos, cosine similarity, euclidean distance, K-NN, approximate nearest neighbor(HNSW) and more...
Vector Store/DB : You don't necessarily needs a vector database or you can use indexing framework like FAISS, however, vector databases are easier and efficient to manage. Vector DBs like Chroma, Milvius, Mosiac, Pinecone, etc are few examples. These databases help in scalability, out of the box ranking, sparse vector searches and seamless integration with embedding models.

领英推荐

This AI newsletter is all you need #42

Towards AI 1 年前

Almost Timely News: ??? Small Language Models and…

Christopher Penn 5 个月前

RAG to Riches: Enhancing AI Applications!

Pavan Belagatti 9 个月前

Optimizations to achieve quality

It entirely depends on the use case whether optimizations are necessary. A simple use case can often be built using the basic options available through standard integrations. However, for more complex scenarios, you can explore the following options iteratively to enhance the quality of your RAG solution:

Best suited embedding model

The embedding model you use can make or break your RAG solution. Select the best model that aligns with your use case. You can choose model as per the nature of following factors -

Data: Text(BeRT, GPT, etc), Image(CLIP), specialized in areas like medical, legal, etc. Based on that you can choose a model which is trained on such datasets.
Performance or latency: If speed and low-latency are key requirements for your solution, you might need to opt for lighter models like DistilBERT or ALBERT, which are optimized for fast inference at the cost of a slight trade-off in accuracy.
Token size: Choose an embedding model whose token limit is compatible with the chunk size you require. This ensures the model can effectively process, store, and retrieve meaningful chunks without losing context. For example, GPT-3 has a higher token limit than BERT, which may make it more suitable for handling larger documents or queries.
Leaderboards: Checking leaderboards and benchmarks can help you identify top-performing models in terms of specific metrics. For example, the Hugging Face MTEB (Model Evaluation Leaderboard) or other performance ranking platforms can give you an idea of how different models compare on common NLP tasks.
Fine tuning potential: Consider whether the model allows fine-tuning based on your specific data. Pre-trained models like BERT or T5 can be fine-tuned for your particular domain or task, which might provide significant performance improvements.
Scalability & resources: Evaluate how the embedding model scales with increasing data size. Some models might require substantial computational resources (like GPUs) to function effectively at scale, which can be an important consideration if you're dealing with large datasets or real-time processing.
Model availability & ecosystem: Assess the availability of the model in the ecosystem you are working with. Some models, like GPT-3, are available through API-based access, while others may require hosting on a specific platform (e.g., Hugging Face, TensorFlow). Ensure the model integrates seamlessly into your tech stack.

Hybrid search with learned sparse retrieval

This technique enhances the final output by refining the results obtained from dense vector search. Methods such as BM25, TF-IDF are employed to retrieve documents containing the relevant tokens. Additionally, advanced techniques like term expansion can further improve retrieval effectiveness by leveraging these algorithms. I would love to write about these techniques in future articles.

Train Embedding Models

If pre-trained models don’t provide the accuracy you need, consider training your own embeddings. Training custom embeddings on domain-specific data allows the system to better understand your specific context and deliver more relevant results, improving the overall quality of your solution. By using frameworks like sentence-transformer and training triplet datasets, you can make model understand the nature of data that is available in your domain. Below pic illustrates that embedding models can be tuned to manage embedding space of your data.

Multi Modal Approach

A multi-modal approach leverages various data types (text, images, audio, etc.) for retrieval, enriching the process by incorporating more contextual information and improving result relevance. Even when dealing solely with text, a multi-modal approach can be used to ensemble results for better accuracy. However, this comes with a trade-off in terms of increased latency and cost.

Evaluation

This is not an optimization but a key step in assuring the effectiveness. Always test and evaluate! Continuously assess the quality of your retrieval results and tweak your model, vectors, or approach as needed. Implement real-time feedback loops from users and use precision/recall metrics to fine-tune your RAG setup, ensuring it stays relevant and accurate.

Conclusion

Optimizing retrieval is not a one-size-fits-all approach—it depends on the complexity of the use case, data modality, and performance requirements. By iteratively refining your embedding models, retrieval techniques (dense & sparse), multi-modal approaches, and evaluation strategies, you can significantly enhance the quality of your RAG (Retrieval-Augmented Generation) solutions.

A hybrid retrieval approach—combining dense vector search for semantic understanding and sparse vector search for precise keyword matching—offers the best of both worlds. Additionally, custom-trained embeddings and multi-modal methods can further boost contextual relevance.

Ultimately, balancing accuracy, latency, and cost is key. The right retrieval optimizations ensure that AI-driven systems produce reliable, context-aware, and efficient responses, driving better decision-making and user experiences.

AI RIFFS

877 位关注者

要查看或添加评论，请登录

Rahul Mathur的更多文章

It Gets More Creative as It Gets Hotter!

2025年3月10日

It Gets More Creative as It Gets Hotter!

Introduction & Context If you have already tried LLM models in your use cases then you must already be aware of a…

1 条评论
Vertical AI - New Era of Boom

2024年12月27日

Vertical AI - New Era of Boom

What's the buzz? If you're using generative AI and find yourself wondering, "What's all the buzz about now?"—here's the…

2 条评论
Gandhi's 3 wise Monkeys on AI

2024年11月26日

Gandhi's 3 wise Monkeys on AI

The philosophy of Gandhi's three wise monkeys—"See no evil, hear no evil, speak no evil"—originates from an ancient…
AI's FLOPS Show!

2024年11月5日

AI's FLOPS Show!

AI is undeniably shaping our future and is here to stay. Why FLOP? Because, it's FLOPS :) Well, in this article, we’ll…

1 条评论
Add "FLAIR": Applying AI in FAIR Data Principles

2024年10月21日

Add "FLAIR": Applying AI in FAIR Data Principles

Lately, I've been thinking out loud (as one does) about the way we handle data and how AI is becoming such a natural…

3 条评论

See all articles

Optimizing retrievers for AI

Rahul Mathur

Engineering Leader - ML & AI @ Adobe | IIM-C Alumni

Understanding retriever

领英推荐

Optimizations to achieve quality

Best suited embedding model

Hybrid search with learned sparse retrieval

Train Embedding Models

Multi Modal Approach

Evaluation

Conclusion

AI RIFFS

877 位关注者

Rahul Mathur的更多文章

社区洞察

其他会员也浏览了

The Art & Science of AI Whispering: Mastering Prompt Engineering for Enterprises in the Age of Language Models

Top LLM Papers of the Week (October Week 4, 2024)

Getting Started with Your First RAG System in LlamaIndex

The Future of AI: Small Language Models, Small Agent Models, and Agent AI

How to Build Powerful LLM Apps with Vector Databases + RAG - AI&YOU #55

The Grand Duel: GPT-4 vs. Google's Gemini Ultra

Insider's Edit: OpenAI's Tips for Writing Better Prompts

Retrieval-augmented generation (RAG)

The Accuracy Problem: GPT is a Tool, Not a Source—And It Lies

The Future of AI: The Advancements on the Horizon

Understanding retriever

领英推荐

Optimizations to achieve quality

Best suited embedding model

Hybrid search with learned sparse retrieval

Train Embedding Models

Multi Modal Approach

Evaluation

Conclusion

AI RIFFS

877 位关注者

Rahul Mathur的更多文章

It Gets More Creative as It Gets Hotter!

Vertical AI - New Era of Boom

Gandhi's 3 wise Monkeys on AI

AI's FLOPS Show!

Add "FLAIR": Applying AI in FAIR Data Principles

社区洞察

其他会员也浏览了

The Art & Science of AI Whispering: Mastering Prompt Engineering for Enterprises in the Age of Language Models

Top LLM Papers of the Week (October Week 4, 2024)

Getting Started with Your First RAG System in LlamaIndex

The Future of AI: Small Language Models, Small Agent Models, and Agent AI

How to Build Powerful LLM Apps with Vector Databases + RAG - AI&YOU #55

The Grand Duel: GPT-4 vs. Google's Gemini Ultra

Insider's Edit: OpenAI's Tips for Writing Better Prompts

Retrieval-augmented generation (RAG)

The Accuracy Problem: GPT is a Tool, Not a Source—And It Lies

The Future of AI: The Advancements on the Horizon