Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Credit: https://arxiv.org/pdf/2501.18585

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Today's paper examines a critical issue in o1-like Large Language Models (LLMs) called "underthinking" - where models frequently switch between different reasoning approaches without fully exploring promising solutions. The paper identifies this behavior as a significant limitation in these models' problem-solving capabilities, particularly when tackling complex mathematical problems.

Method Overview

The paper introduces a systematic approach to analyze and address the underthinking issue in o1-like LLMs. First, it establishes a framework for identifying underthinking by examining how models switch between different reasoning thoughts during problem-solving. The paper leverages LLMs to assess whether each thought leads to a correct answer using the following prompt:

This analysis reveals that models often generate more thoughts and use more tokens when producing incorrect answers compared to correct ones.

To quantify underthinking, the paper develops a metric that measures token efficiency in incorrect responses. This metric evaluates how much of the generated content contributes to reaching correct thoughts before switching to alternative approaches. A lower score indicates better token utilization, while a higher score suggests inefficient reasoning due to frequent thought switching.

To address the underthinking issue, the paper proposes a decoding strategy with Thought Switching Penalty (TIP). This approach modifies the model's decoding process by applying penalties to tokens associated with thought transitions, encouraging the model to explore each reasoning path more thoroughly before switching to alternative approaches. The strength and duration of these penalties can be adjusted to optimize performance.

Results

The implementation of the TIP approach led to consistent improvements across multiple challenging datasets:

  • Improved accuracy on MATH500-Hard, GPQA Diamond, and AIME2024 test sets
  • Reduced underthinking scores, indicating more efficient reasoning processes
  • Achieved better performance without requiring model fine-tuning
  • Demonstrated that controlling thought switching can lead to more effective problem-solving

Conclusion

The paper successfully identifies and addresses the underthinking phenomenon in o1-like LLMs through a novel decoding strategy. By encouraging models to explore reasoning paths more thoroughly before switching, the approach improves both efficiency and accuracy in complex problem-solving tasks. For more information please consult the full paper.

Congrats to the authors for their work!

Wang, Yue, et al. "Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs." arXiv preprint arXiv:2501.18585 (2025).

要查看或添加评论,请登录

Vlad Bogolin的更多文章

社区洞察

其他会员也浏览了