登录查看更多内容

?? This AI Makes Big Tech Panic

Pascal Biese

Daily AI highlights for 70k+ experts ???? AI/ML Engineer

发布日期: 2025年1月24日

+ 关注

In this issue:

Re-defining what’s possible in AI
DeepMind going even deeper
Self-training agents are coming

1. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Watching: DeepSeek-R1 (paper/reference for this week’s headline)

What problem does it solve? Large Language Models (LLMs) tend to fall short in their reasoning abilities, especially in STEM-related questions and complex problem-solving scenarios. Traditional approaches heavily rely on supervised fine-tuning (SFT), which might limit the model's ability to develop novel reasoning strategies.

How does it solve the problem? DeepSeek introduces two models: R1-Zero, trained purely through reinforcement learning without prior SFT, and R1, which combines multi-stage training with cold-start data before RL. This approach leads to emergent reasoning behaviors and particularly strong performance in STEM subjects, mathematical reasoning, and coding tasks. The R1 model achieves performance comparable to OpenAI's latest models while addressing the limitations of R1-Zero such as poor readability and language mixing.

What are the key findings? R1 achieved performance comparable to OpenAI-o1-1217 on reasoning tasks and showed superior performance on various benchmarks including MMLU, MMLU-Pro, and GPQA Diamond, particularly in STEM-related questions. The model also demonstrated strong capabilities in document analysis, fact-based queries, and math tasks, while maintaining concise outputs without length bias.

Why is this important? These findings are a huge deal because they demonstrate that reinforcement learning alone can produce models with strong reasoning capabilities, challenging the conventional wisdom that supervised fine-tuning is necessary. This opens new possibilities for training LLMs and suggests that focused RL training can lead to better performance across diverse domains while potentially reducing the need for extensive supervised datasets.

Bonus: I published an introductory overview of their paper earlier this week.

2. Evolving Deeper LLM Thinking

Watching: Mind Evolution (paper)

What problem does it solve? Finding the optimal solution in a Large Language Model’s (LLM’s) output space often requires significant computational resources and time. Traditional approaches like Best-of-N sampling or Sequential Revision may not always yield the best results efficiently. Mind Evolution aims to address this challenge by introducing an evolutionary search strategy that scales inference time compute in LLMs more effectively.

How does it solve the problem? Mind Evolution leverages the generative capabilities of LLMs to create, combine, and refine candidate responses iteratively. The approach starts by generating an initial population of candidate solutions using the LLM. These candidates are then evaluated using a solution evaluator, which assesses their quality without the need to formalize the underlying inference problem. The best-performing candidates are selected and undergo recombination and refinement processes, where the LLM generates new candidates by combining and improving upon the selected ones. This evolutionary cycle continues until a satisfactory solution is found or a computational budget is exhausted.

领英推荐

Learning Generative AI: #2 Satisfying Your Curiosity

Michael McGrath 7 个月前

GenAI-Integrating Human Expertise in Enterprise AI…

Anand Ramachandran 6 个月前

GPT-4o Mini: Bridging the Gap Between Cost and…

ChandraKumar R Pillai 8 个月前

What are the key findings? Their approach significantly outperforms other inference strategies such as Best-of-N and Sequential Revision in natural language planning tasks, controlling for inference cost. In the TravelPlanner and Natural Plan benchmarks, Mind Evolution solves more than 98% of the problem instances using Gemini 1.5 Pro without the use of a formal solver.

Why is this important? This demonstrates that a more efficient and flexible approach to complex problem-solving with LLMs is possible. By eliminating the need for formal problem specifications while maintaining high performance, Mind Evolution could make LLMs more practical and effective for real-world planning and problem-solving applications.

3. Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Watching: Agent-R (paper/code)

What problem does it solve? LLM agents often struggle with error recovery in interactive environments. While behavior cloning from expert demonstrations can help improve performance, it doesn't adequately prepare agents for handling mistakes in real-world scenarios. Creating step-by-step critique data manually would be extremely costly and time-consuming, so there's a need for automated self-improvement mechanisms.

How does it solve the problem? Agent-R introduces an innovative self-training framework that uses Monte Carlo Tree Search (MCTS) to generate training data focused on error recovery. Instead of waiting until the end of a task to evaluate success, the system identifies errors as they occur and splices in correct alternative paths from the same decision point. This creates a dynamic learning environment where the agent learns to recognize and correct mistakes based on its current capabilities, leading to more efficient learning and better error recovery.

What are the key findings? The results show that Agent-R successfully improves agents' ability to recover from errors and enables timely error correction. In experiments across three interactive environments, Agent-R outperformed baseline methods by 5.59%, demonstrating effective error correction while avoiding loops.

Why is this important? They present a solution to one of the major challenges in LLM agents: the ability to autonomously recover from errors without requiring expensive human-annotated critique data. This advancement is crucial for deploying LLM agents in real-world applications where error recovery is essential for reliable performance.

Papers of the Week:

?? If you enjoyed this article, give it a like and share it with your peers.

LLM Watch

54,005 位关注者

Harshal Kumawat

I share AI insights that make a difference ↗

2 个月

Pascal Biese, DeepSeek R1 is amazing! ＼?(?°?o?°?)?／ I love how it uses reinforcement learning – when I ask it to write or build something, it actually thinks through it!!! Watching the thought process behind the code is a game-changer! ?? Perfect way to really understand coding from the ground up! ?????

Alden Do Rosario

Founder & CEO - CustomGPT.ai

2 个月

UPDATE: If anyone is wondering if Deepseek is right for you, I just posted this helpful guide : https://medium.com/@aldendorosario/deepseek-r1-is-it-right-for-you-a-practical-self-assessment-for-businesses-and-individuals-fc0021d85b60?source=friends_link&sk=e65847700c8b2fd3d989bbbc9ac280a2

Martin Vlach

2 个月

*supposedly being much more efficient — we don't see OAI's numbers, beside like total employee count

Thomas Inführ

Senior Manager - Digital Factory - Team Lead Automated Managed Services

2 个月

I‘m fairly sceptic. Let’s wait and see some real world examples. This company behind this model is a failed hedge fund..,

1 次回应

Deepesh Jain

Founder & CEO, Durapid Technologies | Enterprise Architect | Assisting Enterprises With Seamless Digital Transformation

2 个月

DeepSeek-R1 is definitely making waves in the AI community! With its efficiency and open-source approach, it’s giving OpenAI’s o1 a real run for its money. It's great to see more transparency and user-friendly licensing in the AI space, this is the kind of innovation we need!

1 次回应

查看更多评论

要查看或添加评论，请登录

Pascal Biese的更多文章

?? Vibe Coding + Knowledge Graphs = 10x Cheaper

2025年3月28日

?? Vibe Coding + Knowledge Graphs = 10x Cheaper

In this issue: Repository-level software engineering Chain-of-Tools for better tool calling The most complete AI model…

2 条评论
?? Quantum-Enhanced AI - It's Here

2025年3月21日

?? Quantum-Enhanced AI - It's Here

In this issue: Chinese researchers introduce quantum-enhanced fine-tuning Enabling open-source reinforcement learning…

4 条评论
?? Search-R1, Gemini Embeddings & Controlled Reasoning with L1

2025年3月14日

?? Search-R1, Gemini Embeddings & Controlled Reasoning with L1

In this issue: Emergent search behavior in LLMs Stopping reasoning models from “overthinking” The best embeddings - for…

1 条评论
?? QwQ-32B: 20x smaller than DeepSeek-R1

2025年3月7日

?? QwQ-32B: 20x smaller than DeepSeek-R1

In this issue: China just did it again: a new open source powerhouse The art of post-training reasoning models A new…

6 条评论
OpenAI Can Not Be Happy About This

2025年2月28日

OpenAI Can Not Be Happy About This

In this issue: OpenAI releases first “vibe” model Microsoft bets on data quality and efficiency When old benchmarks…
?????? One Giant Leap for AI Optimization

2025年2月21日

?????? One Giant Leap for AI Optimization

In this issue: Sakana’s AI CUDA Engineer Inner Thinking Transformers Better Code Generation for any model Accelerate…
LLM Watch#74: DeepSeek-R1 Was Only The Beginning

2025年2月14日

LLM Watch#74: DeepSeek-R1 Was Only The Beginning

In this issue: 1B model > 405B model AI winning Olympic Gold Generating world models on the fly For those of you that…

5 条评论
?? Massive Progress in Reasoning Models

2025年2月7日

?? Massive Progress in Reasoning Models

In this issue: Beating OpenAI with Open-Source 99% performance with only 1% data Chain-of-Associated-Thoughts (CoAT)…

2 条评论
??? Automatic Prompt Engineering 2.0

2025年1月31日

??? Automatic Prompt Engineering 2.0

Foreword: hi everyone, I hope you had a great week! Before we dive into this newsletter and its (hopefully) exciting…

5 条评论
?? Google Releases Transformer 2.0

2025年1月17日

?? Google Releases Transformer 2.0

In this issue: From Transformers to Titans Smaller, weaker, yet better O1-preview-level results for $450 Interested in…

9 条评论

See all articles

?? This AI Makes Big Tech Panic

Pascal Biese

Daily AI highlights for 70k+ experts ???? AI/ML Engineer

In this issue:

1. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

2. Evolving Deeper LLM Thinking

领英推荐

3. Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Papers of the Week:

?? If you enjoyed this article, give it a like and share it with your peers.

LLM Watch

54,005 位关注者

Pascal Biese的更多文章

社区洞察

其他会员也浏览了

Mastering Agentic AI: A Comprehensive Strategic Roadmap for 2025 and Beyond

The Deep Reasoning Era of Generative, Agentic, and Superintelligent AI Has Begun

DeepSeek-R1: Redefining Reasoning in AI

Top AI/ML Papers of the Week [25/03 - 31/03]

Basics of Artificial Intelligence

Beyond LLMs – Welcome to AdaWorld: A New Frontier in Adaptable AI

The Beginning of a New AI Paradigm

NewMind AI Journal #38

GenAI Tools and their Applications

Current Trends in Computer Vision & Machine Learning

In this issue:

1. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

2. Evolving Deeper LLM Thinking

领英推荐

3. Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Papers of the Week:

?? If you enjoyed this article, give it a like and share it with your peers.

LLM Watch

54,005 位关注者

Pascal Biese的更多文章

?? Vibe Coding + Knowledge Graphs = 10x Cheaper

?? Quantum-Enhanced AI - It's Here

?? Search-R1, Gemini Embeddings & Controlled Reasoning with L1

?? QwQ-32B: 20x smaller than DeepSeek-R1

OpenAI Can Not Be Happy About This

?????? One Giant Leap for AI Optimization

LLM Watch#74: DeepSeek-R1 Was Only The Beginning

?? Massive Progress in Reasoning Models

??? Automatic Prompt Engineering 2.0

?? Google Releases Transformer 2.0

社区洞察

其他会员也浏览了

Mastering Agentic AI: A Comprehensive Strategic Roadmap for 2025 and Beyond

The Deep Reasoning Era of Generative, Agentic, and Superintelligent AI Has Begun

DeepSeek-R1: Redefining Reasoning in AI

Top AI/ML Papers of the Week [25/03 - 31/03]

Basics of Artificial Intelligence

Beyond LLMs – Welcome to AdaWorld: A New Frontier in Adaptable AI

The Beginning of a New AI Paradigm

NewMind AI Journal #38

GenAI Tools and their Applications

Current Trends in Computer Vision & Machine Learning