Lamini Memory Tuning towards achieving better Accuracy and lesser Hallucination
Anindita Desarkar, PhD
PhD in CSE (JU) || Product Owner || Gen AI Practitioner || Director @LTIMindtree|| Dedicated Researcher in Data Science, Gen AI || Mentor || Patents on AI/DS/Gen AI
1. Problem: Lesser Accuracy and Higher Hallucination in LLM Response
Accuracy in LLM response and hallucination are two major limitations towards adopting the Large Language Model in the user community. Some amount of improvement is achieved through advanced prompt engineering techniques and RAG based architecture; however, it’s not enough for critical applications where hallucination and accuracy are two major metrics.
2. Proposed Solution: Lamini Memory Tuning
Lamini Memory Tuning is?a research breakthrough that overcomes a seeming paradox in the AI world: achieving precise factual accuracy (i.e. no hallucinations) while upholding the generalization capabilities that make LLMs valuable in the first place. A memory-tuning feature?simplifies the task of memory configuration by automatically setting values for several memory configuration parameters.
Lamini Memory Tuning is a completely new way to fine-tune any existing LLM by tuning millions of LoRA adapters and selecting across them in a wide Mixture of Experts at inference time. The following Figure 1 presents its advantages compared to other techniques.
3. Methodology:
The method entails?tuning millions of expert adapters (e.g. LoRAs) with precise facts on top of any open-source LLM, like Llama 3 or Mistral 3.
Lamini Memory Tuning is a fundamentally different fine-tuning approach that effectively teaches any open-source LLM to be near-perfect on facts, while still maintaining its ability to be pretty good at everything else.?When the model is supposed to recall a fact, Lamini Memory Tuning shifts the entire probability mass to that particular fact (i.e. specific tokens within a particular context), such as the exact SQL schema for your database. This results in output probabilities that are not just closer to the right result, but exactly there. Following Figure 2 presents the same.
To do this, Lamini Memory Tuning tunes a massive mixture of?memory?experts on any open-source LLM. Each memory expert acts like a LoRA adapter that functionally operates as memory for the model. Together, the memory experts specialize in a million different ways to ensure faithful and factual accuracy to the data that it was tuned on. Inspired by information retrieval, these million memory experts are equivalent to indices from which the model intelligently retrieves and routes. At inference time, the model retrieves the most relevant experts at each layer and merges back into the base model to respond to the user query.
The result is a sparsely activated model, called a Mixture of Memory Experts (MoME), that can scale to an enormous number of parameters at a fixed computational inference cost.?This means MoMEs have extremely high capacity for the number of facts that can be learned, bounded only by the total size of the training data set.
The massive MoME is designed to cut down on the amount of computation required to memorize facts. This is accomplished by the following training algorithm:
1. For a given question, select a subset of experts, e.g. 32 out of the array of one million.
2. Freeze the weights of the backbone network and the cross attention used to select the expert.
3. Take gradient descent steps until the loss is reduced sufficiently to memorize the fact.
The computation cost of memorizing each fact now scales with the number of training examples, not with the total number of parameters in the network.
4. Lamini LLM Photographic Memory Evaluation Suite:
Lamini is introducing a new evaluation benchmark suite that quantifies LLM performance on tasks requiring photographic memory for dependable and precise model evaluation. The suite includes benchmarks that test model precision and recall specific domain data, such as finance, e-commerce, medicine, etc. We call this “Photographic memory” because the tasks require an exact match. These are usually the kinds of tasks that enterprises work on. The benchmarks can easily be adapted to a specific enterprise use case for private data.
The suite also incorporates well-known open-source benchmarks such as MMLU, TruthfulQA, and others to compare the model's performance against the base model. This helps assess whether the knowledge acquired during pre-training is retained after fine-tuning.
There are many standard benchmarks for evaluating LLM outputs. Each serves a different purpose and targets different abilities of LLMs for evaluation.
5. Use Cases:
Following are the recommended use cases for which this technique provides the best results.
领英推荐
A. High precision text-to-SQL:
Text-to-SQL is a natural language processing (NLP) task that turns plain text into SQL queries. The goal is to empower non-technical users to access their business data without having to be SQL or database wizards. ?
Translating natural language text into syntactically and semantically correct SQL queries is challenging for many reasons including the inherent ambiguity of natural language, complex database schemas, and advanced SQL operations. LLMs are getting better at code generation, but accuracy is still an issue because LLMs are prone to hallucinating without adequate domain context.??
Prompting combined with Retrieval Augmented Generation (RAG) is a common approach to text-to-SQL because it’s relatively easy, cost-effective, and offer fast feedback loops. While prompting and RAG may be fine for very simple schemas and user questions, it doesn’t work well in more complex schemas and data environments in most real-world applications.?
?But Lamini memory fine tuning works much better here than RAG and prompting approaches.
B. High precision Classification:
Save thousands of hours by automatically labeling data accurately.
C. High precision Recommendations:
Increase cart size and revenue with AI-powered product suggestions.
6. Advantages:
?
References:
?1.?Introducing Lamini Memory Tuning: 95% LLM Accuracy, 10x Fewer Hallucinations
?2.?Mixture of Memory Experts: Lamini Memory Tuning
?3. [R] What’s Memory Tuning and how does it give higher accuracy + speed than RAG and prompting?
?4.?[AARR] Lamini - Memory Tuning
?5.?Lamini LLM Photographic Memory Evaluation Suite
Generative AI Strategist | Sr Director | ?? Data & Analytics | ?? | ?? Story Teller ???
6 个月Wow this is awesome Dr. Anindita Desarkar, PhD