Lamini Memory Tuning towards achieving better Accuracy and lesser Hallucination

Lamini Memory Tuning towards achieving better Accuracy and lesser Hallucination

1. Problem: Lesser Accuracy and Higher Hallucination in LLM Response

Accuracy in LLM response and hallucination are two major limitations towards adopting the Large Language Model in the user community. Some amount of improvement is achieved through advanced prompt engineering techniques and RAG based architecture; however, it’s not enough for critical applications where hallucination and accuracy are two major metrics.

2. Proposed Solution: Lamini Memory Tuning

Lamini Memory Tuning is?a research breakthrough that overcomes a seeming paradox in the AI world: achieving precise factual accuracy (i.e. no hallucinations) while upholding the generalization capabilities that make LLMs valuable in the first place. A memory-tuning feature?simplifies the task of memory configuration by automatically setting values for several memory configuration parameters.

Lamini Memory Tuning is a completely new way to fine-tune any existing LLM by tuning millions of LoRA adapters and selecting across them in a wide Mixture of Experts at inference time. The following Figure 1 presents its advantages compared to other techniques.

Figure 1: Advantages of Lamini Memory Tuning [1]

3. Methodology:

The method entails?tuning millions of expert adapters (e.g. LoRAs) with precise facts on top of any open-source LLM, like Llama 3 or Mistral 3.

Lamini Memory Tuning is a fundamentally different fine-tuning approach that effectively teaches any open-source LLM to be near-perfect on facts, while still maintaining its ability to be pretty good at everything else.?When the model is supposed to recall a fact, Lamini Memory Tuning shifts the entire probability mass to that particular fact (i.e. specific tokens within a particular context), such as the exact SQL schema for your database. This results in output probabilities that are not just closer to the right result, but exactly there. Following Figure 2 presents the same.

Figure 2: Lamini memory Tuning [1]

To do this, Lamini Memory Tuning tunes a massive mixture of?memory?experts on any open-source LLM. Each memory expert acts like a LoRA adapter that functionally operates as memory for the model. Together, the memory experts specialize in a million different ways to ensure faithful and factual accuracy to the data that it was tuned on. Inspired by information retrieval, these million memory experts are equivalent to indices from which the model intelligently retrieves and routes. At inference time, the model retrieves the most relevant experts at each layer and merges back into the base model to respond to the user query.

The result is a sparsely activated model, called a Mixture of Memory Experts (MoME), that can scale to an enormous number of parameters at a fixed computational inference cost.?This means MoMEs have extremely high capacity for the number of facts that can be learned, bounded only by the total size of the training data set.

The massive MoME is designed to cut down on the amount of computation required to memorize facts. This is accomplished by the following training algorithm:

1. For a given question, select a subset of experts, e.g. 32 out of the array of one million.

2. Freeze the weights of the backbone network and the cross attention used to select the expert.

3. Take gradient descent steps until the loss is reduced sufficiently to memorize the fact.

The computation cost of memorizing each fact now scales with the number of training examples, not with the total number of parameters in the network.

4. Lamini LLM Photographic Memory Evaluation Suite:

Lamini is introducing a new evaluation benchmark suite that quantifies LLM performance on tasks requiring photographic memory for dependable and precise model evaluation. The suite includes benchmarks that test model precision and recall specific domain data, such as finance, e-commerce, medicine, etc. We call this “Photographic memory” because the tasks require an exact match. These are usually the kinds of tasks that enterprises work on. The benchmarks can easily be adapted to a specific enterprise use case for private data.

The suite also incorporates well-known open-source benchmarks such as MMLU, TruthfulQA, and others to compare the model's performance against the base model. This helps assess whether the knowledge acquired during pre-training is retained after fine-tuning.

There are many standard benchmarks for evaluating LLM outputs. Each serves a different purpose and targets different abilities of LLMs for evaluation.

  • MMLU (Massive Multitask Language Understanding): MMLU benchmark for knowledge-intensive question answering measures a model's multitask accuracy across 57 domains.
  • TruthfulQA: A benchmark to measure whether a language model is truthful in generating answers to questions, and comprises of 817 questions across 38 categories, such as health, law, finance, and politics.
  • WinoGrande: A benchmark for commonsense reasoning, that includes 273 expert-crafted pronoun resolution problems.
  • HellaSwag: A benchmark for commonsense natural language inference. Except the above ones, many more evaluation datasets exist which Lamini's Evaluation Suite uses for a holistic evaluation.

5. Use Cases:

Following are the recommended use cases for which this technique provides the best results.

A. High precision text-to-SQL:

Text-to-SQL is a natural language processing (NLP) task that turns plain text into SQL queries. The goal is to empower non-technical users to access their business data without having to be SQL or database wizards. ?

Translating natural language text into syntactically and semantically correct SQL queries is challenging for many reasons including the inherent ambiguity of natural language, complex database schemas, and advanced SQL operations. LLMs are getting better at code generation, but accuracy is still an issue because LLMs are prone to hallucinating without adequate domain context.??

Prompting combined with Retrieval Augmented Generation (RAG) is a common approach to text-to-SQL because it’s relatively easy, cost-effective, and offer fast feedback loops. While prompting and RAG may be fine for very simple schemas and user questions, it doesn’t work well in more complex schemas and data environments in most real-world applications.?

?But Lamini memory fine tuning works much better here than RAG and prompting approaches.

B. High precision Classification:

Save thousands of hours by automatically labeling data accurately.

C. High precision Recommendations:

Increase cart size and revenue with AI-powered product suggestions.

6. Advantages:

  • Higher Accuracy: Higher accuracy enables full automation as opposed to copiloting.
  • Cost-Effectiveness: Companies can leverage their existing GPU investments, regardless of the provider, to implement Lamini Memory Tuning without requiring a complete hardware overhaul.
  • Lower Latency: Lower latency enables seamless user experiences.
  • Faster Deployment: Smaller models mean faster development and improvement cycles.
  • Future-Proofing: As the GPU landscape evolves, the technique can be easily adapted to new hardware innovations from any provider.
  • Infrastructure Flexibility: Organizations can deploy the technique across heterogeneous computing environments without being locked into a single compute provider’s ecosystem.

?

References:

?1.?Introducing Lamini Memory Tuning: 95% LLM Accuracy, 10x Fewer Hallucinations

Link: https://www.lamini.ai/blog/lamini-memory-tuning

?2.?Mixture of Memory Experts: Lamini Memory Tuning

Link: https://medium.com/pythons-gurus/mixture-of-memory-experts-lamini-memory-tuning-9f81f3f2765a

?3. [R] What’s Memory Tuning and how does it give higher accuracy + speed than RAG and prompting?

Link: https://www.reddit.com/r/MachineLearning/comments/1dgi1bg/r_whats_memory_tuning_and_how_does_it_give_higher/

?4.?[AARR] Lamini - Memory Tuning

Link: https://tryalign.ai/resources/blog/aarr-lamini-memory-tuning

?5.?Lamini LLM Photographic Memory Evaluation Suite

Link: https://www.lamini.ai/blog/lamini-llm-photographic-memory-evaluation-suite


Arvind S.

Generative AI Strategist | Sr Director | ?? Data & Analytics | ?? | ?? Story Teller ???

6 个月

Wow this is awesome Dr. Anindita Desarkar, PhD

要查看或添加评论,请登录

Anindita Desarkar, PhD的更多文章

社区洞察

其他会员也浏览了