登录查看更多内容

??? The Dawn of Foundation Agents

Pascal Biese

Daily AI highlights for 70k+ experts ???? AI/ML Engineer

发布日期: 2024年5月31日

+ 关注

In this issue:

From foundation models to foundation agents
Symbolic expressions meet CoT
When fine-tuning on new knowledge goes wrong

Subscribe now

1. Foundation Agents as the Paradigm Shift for Decision Making

Watching: Foundation Agents (paper)

What problem does it solve? Foundation models like large language models (LLMs) have demonstrated remarkable adaptability to various tasks with minimal fine-tuning. However, decision-making agents often struggle with sample efficiency and generalization due to the complex interplay between perception, memory, and reasoning required to determine optimal policies. The authors propose the development of foundation agents as a paradigm shift in agent learning, drawing inspiration from the success of LLMs.

How does it solve the problem? The proposed foundation agents are characterized by their ability to rapidly adapt to new tasks, similar to LLMs. The roadmap for creating foundation agents involves collecting or generating large interactive datasets, employing self-supervised pretraining and adaptation techniques, and aligning the agents' knowledge and values with those of LLMs. By leveraging the strengths of foundation models, agents can potentially overcome the challenges of sample efficiency and generalization.

What's next? The authors outline critical research questions and trends for foundation agents, addressing both technical and theoretical aspects. They emphasize the need for real-world use cases to drive the development and evaluation of these agents. As the field progresses, foundation agents may revolutionize decision-making processes by enabling more comprehensive and impactful solutions, ultimately leading to agents that can effectively navigate complex environments and adapt to novel situations.

2. Faithful Logical Reasoning via Symbolic Chain-of-Thought

Watching: SymbCoT (paper/code)

领英推荐

The rise and fall of synthetic datasets and smaller…

Thomas Wolf 7 个月前

? Are You Doing RAG Right?

Pascal Biese 8 个月前

Top LLM Papers of the Week (March Week-3 2024)

Kalyan KS 12 个月前

What problem does it solve? Chain-of-Thought (CoT) prompting has been a popular approach for enhancing the reasoning capabilities of Large Language Models (LLMs). However, CoT still struggles with tasks that heavily rely on symbolic expressions and strict logical deduction rules. While LLMs excel at understanding and generating natural language, they often lack the ability to perform rigorous logical reasoning in a symbolic manner.

How does it solve the problem? SymbCoT addresses this limitation by integrating symbolic expressions and logic rules into the CoT prompting framework. It consists of three key steps: 1) Translating the natural language context into a symbolic format, 2) Deriving a step-by-step plan to solve the problem using symbolic logical rules, and 3) Employing a verifier to check the correctness of the translation and reasoning chain. By incorporating symbolic representations and explicit logical reasoning steps, SymbCoT enables LLMs to handle tasks that require strict adherence to logical deduction rules.

What's next? Future research could explore expanding the range of symbolic expressions and logic systems that can be integrated with CoT prompting. Additionally, investigating the scalability and generalizability of SymbCoT to more complex and diverse reasoning tasks would be valuable. As LLMs continue to advance, the combination of symbolic reasoning and natural language understanding could lead to more faithful, flexible, and explainable AI systems capable of tackling a wider range of logical reasoning challenges.

3. Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

Watching: Fine-Tuning on New Knowledge (paper)

What problem does it solve? Large Language Models (LLMs) are known to be inconsistent in their outputs and sometimes "hallucinate" facts that are not grounded in reality. This is a major issue for applications that require factual accuracy, such as question answering systems or chatbots providing information to users. The problem is often attributed to the fine-tuning process, where the model is exposed to new information that may contradict its pre-existing knowledge acquired during pre-training.

How does it solve the problem? The researchers designed a controlled experiment to study the impact of introducing new factual knowledge during fine-tuning on the model's ability to utilize its pre-existing knowledge. They focused on closed-book question answering and varied the proportion of fine-tuning examples that introduce new knowledge. The results show that LLMs struggle to acquire new factual knowledge through fine-tuning, as examples with new knowledge are learned significantly slower than those consistent with the model's pre-existing knowledge. However, as the model eventually learns the examples with new knowledge, it linearly increases the model's tendency to hallucinate incorrect facts.

What's next? The findings highlight the risk of introducing new factual knowledge through fine-tuning and suggest that LLMs primarily acquire factual knowledge during pre-training, while fine-tuning teaches them to use it more efficiently. This has important implications for the development of LLMs and their applications. Researchers and practitioners should be cautious when introducing new facts during fine-tuning and consider alternative approaches, such as using retrieval-based methods or explicitly separating the acquisition of new knowledge from the fine-tuning process. Future work could explore techniques to mitigate the risk of hallucination and improve the model's ability to incorporate new knowledge without compromising its pre-existing knowledge.

Papers of the Week:

LLM Watch

53,810 位关注者

Ferenc József Rab

Freelance at Moody's Corporation

8 个月

Nagyon király szuper!

Ashish Patel ????

9 个月

Do you have any paper in mind where they have conducted survey on Agents?

1 次回应

查看更多评论

要查看或添加评论，请登录

Pascal Biese的更多文章

?? Quantum-Enhanced AI - It's Here

2025年3月21日

?? Quantum-Enhanced AI - It's Here

In this issue: Chinese researchers introduce quantum-enhanced fine-tuning Enabling open-source reinforcement learning…

3 条评论
?? Search-R1, Gemini Embeddings & Controlled Reasoning with L1

2025年3月14日

?? Search-R1, Gemini Embeddings & Controlled Reasoning with L1

In this issue: Emergent search behavior in LLMs Stopping reasoning models from “overthinking” The best embeddings - for…

1 条评论
?? QwQ-32B: 20x smaller than DeepSeek-R1

2025年3月7日

?? QwQ-32B: 20x smaller than DeepSeek-R1

In this issue: China just did it again: a new open source powerhouse The art of post-training reasoning models A new…

6 条评论
OpenAI Can Not Be Happy About This

2025年2月28日

OpenAI Can Not Be Happy About This

In this issue: OpenAI releases first “vibe” model Microsoft bets on data quality and efficiency When old benchmarks…
?????? One Giant Leap for AI Optimization

2025年2月21日

?????? One Giant Leap for AI Optimization

In this issue: Sakana’s AI CUDA Engineer Inner Thinking Transformers Better Code Generation for any model Accelerate…
LLM Watch#74: DeepSeek-R1 Was Only The Beginning

2025年2月14日

LLM Watch#74: DeepSeek-R1 Was Only The Beginning

In this issue: 1B model > 405B model AI winning Olympic Gold Generating world models on the fly For those of you that…

5 条评论
?? Massive Progress in Reasoning Models

2025年2月7日

?? Massive Progress in Reasoning Models

In this issue: Beating OpenAI with Open-Source 99% performance with only 1% data Chain-of-Associated-Thoughts (CoAT)…

2 条评论
??? Automatic Prompt Engineering 2.0

2025年1月31日

??? Automatic Prompt Engineering 2.0

Foreword: hi everyone, I hope you had a great week! Before we dive into this newsletter and its (hopefully) exciting…

5 条评论
?? This AI Makes Big Tech Panic

2025年1月24日

?? This AI Makes Big Tech Panic

In this issue: Re-defining what’s possible in AI DeepMind going even deeper Self-training agents are coming 1…

11 条评论
?? Google Releases Transformer 2.0

2025年1月17日

?? Google Releases Transformer 2.0

In this issue: From Transformers to Titans Smaller, weaker, yet better O1-preview-level results for $450 Interested in…

9 条评论

See all articles

??? The Dawn of Foundation Agents

Pascal Biese

Daily AI highlights for 70k+ experts ???? AI/ML Engineer

In this issue:

1. Foundation Agents as the Paradigm Shift for Decision Making

2. Faithful Logical Reasoning via Symbolic Chain-of-Thought

领英推荐

3. Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

Papers of the Week:

LLM Watch

53,810 位关注者

Pascal Biese的更多文章

社区洞察

其他会员也浏览了

The System Prompt Behind The Prompt Generator...

Understanding RAG Evaluation Algorithms

Controlling Hallucinations in LLM Responses: A Comprehensive Structure for Verifiable Answers with Citations

The LLM Inc

Evaluating LLM and RAG Systems

Are Long-LLMs A Necessity For Long-Context Tasks?

Metrics That Matter: Measuring LLM Performance

Natural Language Question Answering Systems – Get Quick Answers To Concrete Questions

Steps to Build a Large Language Model (LLM)

The Art of Fine-Tuning Large Language Models, Explained in Depth

In this issue:

1. Foundation Agents as the Paradigm Shift for Decision Making

2. Faithful Logical Reasoning via Symbolic Chain-of-Thought

领英推荐

3. Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?

Papers of the Week:

LLM Watch

53,810 位关注者

Pascal Biese的更多文章

?? Quantum-Enhanced AI - It's Here

?? Search-R1, Gemini Embeddings & Controlled Reasoning with L1

?? QwQ-32B: 20x smaller than DeepSeek-R1

OpenAI Can Not Be Happy About This

?????? One Giant Leap for AI Optimization

LLM Watch#74: DeepSeek-R1 Was Only The Beginning

?? Massive Progress in Reasoning Models

??? Automatic Prompt Engineering 2.0

?? This AI Makes Big Tech Panic

?? Google Releases Transformer 2.0

社区洞察

其他会员也浏览了

The System Prompt Behind The Prompt Generator...

Understanding RAG Evaluation Algorithms

Controlling Hallucinations in LLM Responses: A Comprehensive Structure for Verifiable Answers with Citations

The LLM Inc

Evaluating LLM and RAG Systems

Are Long-LLMs A Necessity For Long-Context Tasks?

Metrics That Matter: Measuring LLM Performance

Natural Language Question Answering Systems – Get Quick Answers To Concrete Questions

Steps to Build a Large Language Model (LLM)

The Art of Fine-Tuning Large Language Models, Explained in Depth