登录查看更多内容

? From Memorization to Generalization

Pascal Biese

Daily AI highlights for 60k+ experts ???? AI/ML Engineer

发布日期: 2024年6月14日

+ 关注

In this issue:

Traveling to the edge of generalization
The next generation of database interfaces
How to prevent your models from collapsing

Subscribe now

1. Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Watching: Grokking (paper/code)

What problem does it solve? Large Language Models (LLMs) have shown remarkable performance across a wide range of natural language tasks. However, even the most capable models struggle with tasks that require implicit reasoning over parametric knowledge, such as composition and comparison. This limitation hinders their ability to systematically generalize to out-of-distribution examples, which is crucial for robust and reliable performance in real-world applications.

How does it solve the problem? The researchers find that transformers can learn implicit reasoning, but only through a process called "grokking," which involves extended training far beyond the point of overfitting. During grokking, the model forms a generalizing circuit that enables it to reason effectively. The efficiency of this circuit relative to memorizing circuits plays a key role in the model's ability to generalize. Additionally, the configuration of the generalizing circuit is connected to the model's systematicity in reasoning. These findings provide insights into how to better induce implicit reasoning in transformers through data and training setup modifications, as well as potential architectural improvements like encouraging cross-layer knowledge sharing.

What's next? The study highlights the power of parametric memory for complex reasoning tasks, as demonstrated by the near-perfect accuracy achieved by a fully grokked transformer on a challenging task with a large search space. In contrast, even advanced models like GPT-4-Turbo and Gemini-1.5-Pro, which rely on non-parametric memory, fail badly regardless of prompting styles or retrieval augmentation. This suggests that future research should focus on developing and optimizing parametric memory in transformers to enhance their reasoning capabilities. Furthermore, the insights gained from this study can guide the design of more effective training strategies and architectural modifications to improve the systematic generalization of LLMs in implicit reasoning tasks.

领英推荐

To Data & Beyond Week 24 Summary

Youssef Hosni 3 个月前

LLM As A System Of Multiple Expert Agents; Evolution…

Danny Butvinik 10 个月前

To Data & Beyond Week 22 Summary

Youssef Hosni 4 个月前

2. Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

Watching: Text-to-SQL (paper)

What problem does it solve? Translating natural language questions into SQL queries (text-to-SQL) is a challenging task that requires understanding user questions, comprehending database schemas, and generating accurate SQL queries. Conventional approaches, including human engineering and deep neural networks, have been used to tackle this problem. However, as databases become more complex and user questions more challenging, these methods may struggle to generate correct SQL queries consistently.

How does it solve the problem? Large Language Models (LLMs) have shown remarkable capabilities in natural language understanding as their model scale continues to increase. By integrating LLMs into text-to-SQL systems, researchers can leverage their advanced comprehension abilities to better understand user questions and generate more accurate SQL queries. LLMs can capture the nuances and complexities of natural language, allowing them to interpret user intent more effectively and map it to the appropriate SQL syntax and database schema.

What's next? While LLMs offer promising solutions for text-to-SQL tasks, there are still challenges to be addressed. Future research should focus on improving the efficiency and scalability of LLM-based text-to-SQL systems, as well as enhancing their ability to handle more complex database schemas and user questions. Additionally, researchers should explore methods to incorporate domain-specific knowledge and reasoning capabilities into LLMs to further improve their performance on text-to-SQL tasks.

3. Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement

Watching: Model Collapse (paper)

What problem does it solve? Fine-tuning large language models (LLMs) on synthesized data generated by the models themselves has emerged as a promising alternative to using human-annotated data. However, this approach raises concerns about model collapse, where the performance of the fine-tuned models deteriorates compared to models trained on human-annotated data. Model collapse occurs when the synthesized data lacks diversity or contains errors, leading to suboptimal fine-tuning.

How does it solve the problem? The researchers propose using feedback on the synthesized data to prevent model collapse. They derive theoretical conditions under which a Gaussian mixture classification model can achieve optimal performance when trained on feedback-augmented synthesized data. The key idea is that providing feedback on the quality of the generated samples, either by pruning incorrect predictions or selecting the best among multiple guesses, can help maintain the quality of the synthesized data. This feedback mechanism ensures that the fine-tuning process is guided by more accurate and diverse examples, mitigating the risk of model collapse.

What's next? The theoretical findings and practical demonstrations in this research underscore the effectiveness of popular approaches like Reinforcement Learning from Human Feedback (RLHF) in preventing model collapse. As LLMs continue to grow in size and capabilities, the demand for large-scale training data will also increase. Synthesizing data using generative models and augmenting it with feedback mechanisms offers a scalable solution to this challenge. Further research can explore more sophisticated feedback techniques, such as active learning or collaborative filtering, to further enhance the quality of synthesized data and improve the performance of fine-tuned LLMs.

Papers of the Week:

LLM Watch

45,742 位关注者

Ferenc József Rab

Freelance at Moody's Corporation

3 个月

Nagyon király szuper!

1 次回应

Rémy Fannader

Author of 'Enterprise Architecture Fundamentals', Founder & Owner of Caminao

3 个月

There are four levels of reasoning depending of the way terms are dealt with:? - Nominal: as signs (tokens) independently of any attachment to actual environments? - Formal: as symbols of categories independently of any attachment to actual environments? - Empiric: as symbols of categories defined in bounded semantic contexts attached to actual specific environments - Epistemic: as symbols of categories defined in unbounded contexts AI can fully support the formal level, partially support the empiric level, but will always fail with the epistemic level. https://caminao.blog/overview/knowledge-kaleidoscope/reasoning/

4 次回应

Luis Molina

Technical Lead AI - Engineer AI

3 个月

Do we have some open source model grokked?

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

? From Memorization to Generalization

Pascal Biese

Daily AI highlights for 60k+ experts ???? AI/ML Engineer

In this issue:

1. Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

领英推荐

2. Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

3. Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement

Papers of the Week:

LLM Watch

45,742 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

?? Mamba > Transformers?

Do the Laws of Computation Imply That We Will Never Understand Machine Learning?

??Top ML Papers of the Week

The Time Oracle: Decoding Time Series Mysteries with Transformers

Harnessing the Power of Vector Databases: A New Era in Data Management

??Top ML Papers of the Week

ML Papers of The Week (Jan 1-8)

From Data to Intelligence: How Knowledge Graphs are Shaping the Future

Formulation of Node Embeddings in Graphs: Node2Vec Algorithm - Part 6 of X of my notes

A deep dive on Vector Search and its implementation

In this issue:

1. Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

领英推荐

2. Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL

3. Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement

Papers of the Week:

LLM Watch

45,742 位关注者

?? Chasing o1: Closing the Reasoning Gap

2024年10月4日

?? LLMs Are Improving Themselves

2024年9月27日

?? A New Neural Architecture (Again)

2024年9月20日

?? What Next-Gen RAG Is About

2024年9月13日

?? The Next Level of CoT Prompting

2024年9月6日

?? Agents for Time Series Analysis

2024年8月30日

??? Agent-ception: When Agents Are Creating Agents

2024年8月23日

?? Apple's Answer to Complex LLM Evaluation

2024年8月16日

?? The Downsides of Structured Outputs

2024年8月9日

?????? Attention Is All Graphs Need

2024年8月2日

社区洞察

其他会员也浏览了

?? Mamba > Transformers?

Do the Laws of Computation Imply That We Will Never Understand Machine Learning?

??Top ML Papers of the Week

The Time Oracle: Decoding Time Series Mysteries with Transformers

Harnessing the Power of Vector Databases: A New Era in Data Management

??Top ML Papers of the Week

ML Papers of The Week (Jan 1-8)

From Data to Intelligence: How Knowledge Graphs are Shaping the Future

Formulation of Node Embeddings in Graphs: Node2Vec Algorithm - Part 6 of X of my notes

A deep dive on Vector Search and its implementation