登录查看更多内容

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Vlad Bogolin

AI/ML Engineer & Researcher | Large Language Models (LLMs)

发布日期: 2025年2月2日

Today's paper examines a critical issue in o1-like Large Language Models (LLMs) called "underthinking" - where models frequently switch between different reasoning approaches without fully exploring promising solutions. The paper identifies this behavior as a significant limitation in these models' problem-solving capabilities, particularly when tackling complex mathematical problems.

Method Overview

The paper introduces a systematic approach to analyze and address the underthinking issue in o1-like LLMs. First, it establishes a framework for identifying underthinking by examining how models switch between different reasoning thoughts during problem-solving. The paper leverages LLMs to assess whether each thought leads to a correct answer using the following prompt:

This analysis reveals that models often generate more thoughts and use more tokens when producing incorrect answers compared to correct ones.

To quantify underthinking, the paper develops a metric that measures token efficiency in incorrect responses. This metric evaluates how much of the generated content contributes to reaching correct thoughts before switching to alternative approaches. A lower score indicates better token utilization, while a higher score suggests inefficient reasoning due to frequent thought switching.

To address the underthinking issue, the paper proposes a decoding strategy with Thought Switching Penalty (TIP). This approach modifies the model's decoding process by applying penalties to tokens associated with thought transitions, encouraging the model to explore each reasoning path more thoroughly before switching to alternative approaches. The strength and duration of these penalties can be adjusted to optimize performance.

领英推荐

??Top ML Papers of the Week

DAIR.AI 1 年前

Where will LLMs be in the Next 12 Months?

Venkata Pingali 1 个月前

Hard Graph: Understanding Information Systems Today

Martin Nikel 5 年前

Results

The implementation of the TIP approach led to consistent improvements across multiple challenging datasets:

Improved accuracy on MATH500-Hard, GPQA Diamond, and AIME2024 test sets
Reduced underthinking scores, indicating more efficient reasoning processes
Achieved better performance without requiring model fine-tuning
Demonstrated that controlling thought switching can lead to more effective problem-solving

Conclusion

The paper successfully identifies and addresses the underthinking phenomenon in o1-like LLMs through a novel decoding strategy. By encouraging models to explore reasoning paths more thoroughly before switching, the approach improves both efficiency and accuracy in complex problem-solving tasks. For more information please consult the full paper.

Congrats to the authors for their work!

Wang, Yue, et al. "Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs." arXiv preprint arXiv:2501.18585 (2025).

AI Paper of the Day

1,331 位关注者

要查看或添加评论，请登录

Vlad Bogolin的更多文章

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

2025年3月30日

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

Today's paper introduces LeX-Art, a comprehensive framework for high-quality text-image synthesis that addresses the…
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

2025年3月29日

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Today's paper introduces VBench-2.0, a comprehensive benchmark suite designed to evaluate video generation models for…
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

2025年3月28日

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Today's paper introduces UI-R1, a novel approach that uses reinforcement learning to improve the reasoning capabilities…
Qwen2.5-Omni Technical Report

2025年3月27日

Qwen2.5-Omni Technical Report

Today's paper introduces Qwen2.5-Omni, an end-to-end multimodal model designed to perceive diverse modalities including…
Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

2025年3月26日

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Today's paper introduces FakeVLM, a specialized large multimodal model designed for detecting synthetic images and…
Video-T1: Test-Time Scaling for Video Generation

2025年3月25日

Video-T1: Test-Time Scaling for Video Generation

Today's paper introduces Video-T1, a novel approach that explores the potential of Test-Time Scaling (TTS) for video…
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

2025年3月24日

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Today's paper introduces OpenVLThinker, a large vision-language model (LVLM) that demonstrates complex reasoning…
LEGION: Learning to Ground and Explain for Synthetic Image Detection

2025年3月23日

LEGION: Learning to Ground and Explain for Synthetic Image Detection

Today's paper introduces LEGION, a comprehensive framework for synthetic image detection that not only identifies fake…
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

2025年3月22日

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Today's paper introduces JARVIS-VLA, a novel approach for training Vision-Language-Action (VLA) models to play visual…
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

2025年3月21日

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Today's paper presents the a comprehensive survey on efficient reasoning for Large Language Models (LLMs). It addresses…

See all articles

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Vlad Bogolin

AI/ML Engineer & Researcher | Large Language Models (LLMs)

Method Overview

领英推荐

Results

Conclusion

AI Paper of the Day

1,331 位关注者

Vlad Bogolin的更多文章

社区洞察

其他会员也浏览了

Accuracy is not Evil

Last Week's Takeaway

Random Numbers Are Too Important To Be Left To Chance

Murthy on Axioms and Urgency for Foundational Revolution of the Axiomatic Approach

Evolution of LLM Reasoning Agents

DeepSeekMath - LLM for mathematical reasoning and how it transforms Mathematical Problem-Solving

The "Magical" Hammer Fallacy: Why LLMs Aren't Your Go-To Calculator (and That's Okay)

Preference Leakage: A Contamination Problem in LLM-as-a-judge

Multi-Head Mixture-of-Experts

?? LLM Research Roundup: Thursday Highlights

Method Overview

领英推荐

Results

Conclusion

AI Paper of the Day

1,331 位关注者

Vlad Bogolin的更多文章

LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Qwen2.5-Omni Technical Report

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Video-T1: Test-Time Scaling for Video Generation

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

LEGION: Learning to Ground and Explain for Synthetic Image Detection

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

社区洞察

其他会员也浏览了

Accuracy is not Evil

Last Week's Takeaway

Random Numbers Are Too Important To Be Left To Chance

Murthy on Axioms and Urgency for Foundational Revolution of the Axiomatic Approach

Evolution of LLM Reasoning Agents

DeepSeekMath - LLM for mathematical reasoning and how it transforms Mathematical Problem-Solving

The "Magical" Hammer Fallacy: Why LLMs Aren't Your Go-To Calculator (and That's Okay)

Preference Leakage: A Contamination Problem in LLM-as-a-judge

Multi-Head Mixture-of-Experts

?? LLM Research Roundup: Thursday Highlights