登录查看更多内容

Lamini Memory Tuning towards achieving better Accuracy and lesser Hallucination

Anindita Desarkar, PhD

PhD in CSE (JU) || Product Owner || Gen AI Practitioner || Director @LTIMindtree|| Dedicated Researcher in Data Science, Gen AI || Mentor || Patents on AI/DS/Gen AI

发布日期: 2024年9月13日

1. Problem: Lesser Accuracy and Higher Hallucination in LLM Response

Accuracy in LLM response and hallucination are two major limitations towards adopting the Large Language Model in the user community. Some amount of improvement is achieved through advanced prompt engineering techniques and RAG based architecture; however, it’s not enough for critical applications where hallucination and accuracy are two major metrics.

2. Proposed Solution: Lamini Memory Tuning

Lamini Memory Tuning is?a research breakthrough that overcomes a seeming paradox in the AI world: achieving precise factual accuracy (i.e. no hallucinations) while upholding the generalization capabilities that make LLMs valuable in the first place. A memory-tuning feature?simplifies the task of memory configuration by automatically setting values for several memory configuration parameters.

Lamini Memory Tuning is a completely new way to fine-tune any existing LLM by tuning millions of LoRA adapters and selecting across them in a wide Mixture of Experts at inference time. The following Figure 1 presents its advantages compared to other techniques.

Figure 1: Advantages of Lamini Memory Tuning [1]

3. Methodology:

The method entails?tuning millions of expert adapters (e.g. LoRAs) with precise facts on top of any open-source LLM, like Llama 3 or Mistral 3.

Lamini Memory Tuning is a fundamentally different fine-tuning approach that effectively teaches any open-source LLM to be near-perfect on facts, while still maintaining its ability to be pretty good at everything else.?When the model is supposed to recall a fact, Lamini Memory Tuning shifts the entire probability mass to that particular fact (i.e. specific tokens within a particular context), such as the exact SQL schema for your database. This results in output probabilities that are not just closer to the right result, but exactly there. Following Figure 2 presents the same.

To do this, Lamini Memory Tuning tunes a massive mixture of?memory?experts on any open-source LLM. Each memory expert acts like a LoRA adapter that functionally operates as memory for the model. Together, the memory experts specialize in a million different ways to ensure faithful and factual accuracy to the data that it was tuned on. Inspired by information retrieval, these million memory experts are equivalent to indices from which the model intelligently retrieves and routes. At inference time, the model retrieves the most relevant experts at each layer and merges back into the base model to respond to the user query.

The result is a sparsely activated model, called a Mixture of Memory Experts (MoME), that can scale to an enormous number of parameters at a fixed computational inference cost.?This means MoMEs have extremely high capacity for the number of facts that can be learned, bounded only by the total size of the training data set.

The massive MoME is designed to cut down on the amount of computation required to memorize facts. This is accomplished by the following training algorithm:

1. For a given question, select a subset of experts, e.g. 32 out of the array of one million.

2. Freeze the weights of the backbone network and the cross attention used to select the expert.

3. Take gradient descent steps until the loss is reduced sufficiently to memorize the fact.

The computation cost of memorizing each fact now scales with the number of training examples, not with the total number of parameters in the network.

4. Lamini LLM Photographic Memory Evaluation Suite:

Lamini is introducing a new evaluation benchmark suite that quantifies LLM performance on tasks requiring photographic memory for dependable and precise model evaluation. The suite includes benchmarks that test model precision and recall specific domain data, such as finance, e-commerce, medicine, etc. We call this “Photographic memory” because the tasks require an exact match. These are usually the kinds of tasks that enterprises work on. The benchmarks can easily be adapted to a specific enterprise use case for private data.

The suite also incorporates well-known open-source benchmarks such as MMLU, TruthfulQA, and others to compare the model's performance against the base model. This helps assess whether the knowledge acquired during pre-training is retained after fine-tuning.

There are many standard benchmarks for evaluating LLM outputs. Each serves a different purpose and targets different abilities of LLMs for evaluation.

MMLU (Massive Multitask Language Understanding): MMLU benchmark for knowledge-intensive question answering measures a model's multitask accuracy across 57 domains.
TruthfulQA: A benchmark to measure whether a language model is truthful in generating answers to questions, and comprises of 817 questions across 38 categories, such as health, law, finance, and politics.
WinoGrande: A benchmark for commonsense reasoning, that includes 273 expert-crafted pronoun resolution problems.
HellaSwag: A benchmark for commonsense natural language inference. Except the above ones, many more evaluation datasets exist which Lamini's Evaluation Suite uses for a holistic evaluation.

5. Use Cases:

Following are the recommended use cases for which this technique provides the best results.

领英推荐

Practical Strategies to Enhance LLMs Performance!

Pavan Belagatti 9 个月前

?? Massive Progress in Reasoning Models

Pascal Biese 1 个月前

?? Top LLM Papers of the Week (December Week 1, 2024)

Kalyan KS 3 个月前

A. High precision text-to-SQL:

Text-to-SQL is a natural language processing (NLP) task that turns plain text into SQL queries. The goal is to empower non-technical users to access their business data without having to be SQL or database wizards. ?

Translating natural language text into syntactically and semantically correct SQL queries is challenging for many reasons including the inherent ambiguity of natural language, complex database schemas, and advanced SQL operations. LLMs are getting better at code generation, but accuracy is still an issue because LLMs are prone to hallucinating without adequate domain context.??

Prompting combined with Retrieval Augmented Generation (RAG) is a common approach to text-to-SQL because it’s relatively easy, cost-effective, and offer fast feedback loops. While prompting and RAG may be fine for very simple schemas and user questions, it doesn’t work well in more complex schemas and data environments in most real-world applications.?

?But Lamini memory fine tuning works much better here than RAG and prompting approaches.

B. High precision Classification:

Save thousands of hours by automatically labeling data accurately.

C. High precision Recommendations:

Increase cart size and revenue with AI-powered product suggestions.

6. Advantages:

Higher Accuracy: Higher accuracy enables full automation as opposed to copiloting.
Cost-Effectiveness: Companies can leverage their existing GPU investments, regardless of the provider, to implement Lamini Memory Tuning without requiring a complete hardware overhaul.
Lower Latency: Lower latency enables seamless user experiences.
Faster Deployment: Smaller models mean faster development and improvement cycles.
Future-Proofing: As the GPU landscape evolves, the technique can be easily adapted to new hardware innovations from any provider.
Infrastructure Flexibility: Organizations can deploy the technique across heterogeneous computing environments without being locked into a single compute provider’s ecosystem.

References:

?1.?Introducing Lamini Memory Tuning: 95% LLM Accuracy, 10x Fewer Hallucinations

Link: https://www.lamini.ai/blog/lamini-memory-tuning

?2.?Mixture of Memory Experts: Lamini Memory Tuning

Link: https://medium.com/pythons-gurus/mixture-of-memory-experts-lamini-memory-tuning-9f81f3f2765a

?3. [R] What’s Memory Tuning and how does it give higher accuracy + speed than RAG and prompting?

Link: https://www.reddit.com/r/MachineLearning/comments/1dgi1bg/r_whats_memory_tuning_and_how_does_it_give_higher/

?4.?[AARR] Lamini - Memory Tuning

Link: https://tryalign.ai/resources/blog/aarr-lamini-memory-tuning

?5.?Lamini LLM Photographic Memory Evaluation Suite

Link: https://www.lamini.ai/blog/lamini-llm-photographic-memory-evaluation-suite

Arvind S.

Generative AI Strategist | Sr Director | ?? Data & Analytics | ?? | ?? Story Teller ???

6 个月

Wow this is awesome Dr. Anindita Desarkar, PhD

1 次回应

要查看或添加评论，请登录

Anindita Desarkar, PhD的更多文章

Deep Drive into DeepSeek for Deep Reasoning

2025年2月7日

Deep Drive into DeepSeek for Deep Reasoning

1. Introduction Large Language Models (LLMs) are quickly evolving, inching closer to the goal of Artificial General…

10 条评论
Unlocking Research Potential with Agentic Framework: Crew AI

2025年1月12日

Unlocking Research Potential with Agentic Framework: Crew AI

The Objective of this blog is to explore the tools of Crew AI Agentic framework and how these can be deployed towards…

5 条评论
Agents and Workflow: Real Applications to understand When to use What

2024年12月25日

Agents and Workflow: Real Applications to understand When to use What

1. What are Agents and Workflows? The term "agent" can be interpreted in different ways.

6 条评论
Understanding GraphRAG and Its Challenges

2024年10月5日

Understanding GraphRAG and Its Challenges

What is RAG? RAG is a natural language querying approach for enhancing existing LLMs with external knowledge, so…

5 条评论
Exploring the AI Act Law using ScrapFly powered Web Scrapping API and RAG Framework

2024年9月15日

Exploring the AI Act Law using ScrapFly powered Web Scrapping API and RAG Framework

The Artificial Intelligence (AI) Act is a European Union (EU) law that establishes a legal framework for AI use. The…

3 条评论
Exploring Explainable AI Techniques towards Enhancing Trust in Gen AI Models

2024年8月11日

Exploring Explainable AI Techniques towards Enhancing Trust in Gen AI Models

If we ask any copilot today, “Please write the steps of genetic algorithm”. There are two possible ways of getting the…

3 条评论
How to Talk with Engineering Drawings using Gen AI Techniques – Part 1

2024年7月21日

How to Talk with Engineering Drawings using Gen AI Techniques – Part 1

Probably we all have faced the following questions at some point of time while working. · What is the best way for…

5 条评论
Green Computing: A Myth or Achievable Reality

2024年7月14日

Green Computing: A Myth or Achievable Reality

Introduction: In the arena of Generative AI, increased carbon footprint poses a significant threat to the society which…

1 条评论
How Research differs from usual Development

2024年6月22日

How Research differs from usual Development

Research is a process of systematic inquiry that entails collection of data; documentation of critical information; and…

3 条评论
Detect the Biased behavior of Large Language Models (LLM) through a set of Questionnaire

2024年4月7日

Detect the Biased behavior of Large Language Models (LLM) through a set of Questionnaire

Biasness in Large Language Model: Bias in Large Language Model (LLM) output refers to the incident where the response…

6 条评论

See all articles

Lamini Memory Tuning towards achieving better Accuracy and lesser Hallucination

Anindita Desarkar, PhD

PhD in CSE (JU) || Product Owner || Gen AI Practitioner || Director @LTIMindtree|| Dedicated Researcher in Data Science, Gen AI || Mentor || Patents on AI/DS/Gen AI

1. Problem: Lesser Accuracy and Higher Hallucination in LLM Response

2. Proposed Solution: Lamini Memory Tuning

3. Methodology:

4. Lamini LLM Photographic Memory Evaluation Suite:

5. Use Cases:

领英推荐

A. High precision text-to-SQL:

B. High precision Classification:

C. High precision Recommendations:

6. Advantages:

References:

Anindita Desarkar, PhD的更多文章

社区洞察

其他会员也浏览了

Data Science #27

Paper Review: Byte Latent Transformer: Patches Scale Better Than Tokens

The Power of Polynomial Computations and Deterministic Boolean Algorithms in Real-World Applications

LLM fine-tuning and model selection + other resources

The "Shikata ga nai" encoder is still dominating today - and likely well beyond.

Exploring the LLM Infra Stack, Part 2: The Model Layer

What Does Big O(N^2) Complexity Mean?

Algorithms — Big O Notation

Understanding Big O Notation, Time and Space Complexity

Practical aspects of the semantics of SHACL

1. Problem: Lesser Accuracy and Higher Hallucination in LLM Response

2. Proposed Solution: Lamini Memory Tuning

3. Methodology:

4. Lamini LLM Photographic Memory Evaluation Suite:

5. Use Cases:

领英推荐

A. High precision text-to-SQL:

B. High precision Classification:

C. High precision Recommendations:

6. Advantages:

References:

Anindita Desarkar, PhD的更多文章

Deep Drive into DeepSeek for Deep Reasoning

Unlocking Research Potential with Agentic Framework: Crew AI

Agents and Workflow: Real Applications to understand When to use What

Understanding GraphRAG and Its Challenges

Exploring the AI Act Law using ScrapFly powered Web Scrapping API and RAG Framework

Exploring Explainable AI Techniques towards Enhancing Trust in Gen AI Models

How to Talk with Engineering Drawings using Gen AI Techniques – Part 1

Green Computing: A Myth or Achievable Reality

How Research differs from usual Development

Detect the Biased behavior of Large Language Models (LLM) through a set of Questionnaire

社区洞察

其他会员也浏览了

Data Science #27

Paper Review: Byte Latent Transformer: Patches Scale Better Than Tokens

The Power of Polynomial Computations and Deterministic Boolean Algorithms in Real-World Applications

LLM fine-tuning and model selection + other resources

The "Shikata ga nai" encoder is still dominating today - and likely well beyond.

Exploring the LLM Infra Stack, Part 2: The Model Layer

What Does Big O(N^2) Complexity Mean?

Algorithms — Big O Notation

Understanding Big O Notation, Time and Space Complexity

Practical aspects of the semantics of SHACL