登录查看更多内容

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Vlad Bogolin

AI/ML Engineer & Researcher | Large Language Models (LLMs)

发布日期: 2025年3月21日

Today's paper presents the a comprehensive survey on efficient reasoning for Large Language Models (LLMs). It addresses the "overthinking phenomenon" where reasoning models generate unnecessarily verbose outputs, leading to computational inefficiency. The paper systematically categorizes existing approaches to efficient reasoning and explores methods to optimize reasoning length while maintaining accuracy.

Overview

The paper organizes efficient reasoning approaches into three main categories: model-based, reasoning output-based, and input prompts-based methods.

Model-based efficient reasoning focuses on fine-tuning LLMs to improve their intrinsic ability to reason concisely. This category includes two main approaches. First, Reinforcement Learning (RL) with length reward design, where models are trained using rewards that favor shorter, correct answers while penalizing lengthy or incorrect ones. Various length reward formulations are explored, such as cosine rewards, length-harmonizing rewards, and exceed length penalties. Second, Supervised Fine-Tuning (SFT) with variable-length Chain-of-Thought (CoT) data, which involves constructing datasets with varying reasoning lengths and fine-tuning models on these datasets. Methods for collecting short CoT data include post-reasoning compression (reducing redundant steps after full-length reasoning) and obtaining compressed data during reasoning.

Reasoning output-based efficient reasoning modifies the output paradigm to enhance reasoning efficiency. One approach compresses reasoning steps into fewer latent representations, treating the final-layer hidden states of an LLM as "continuous thought" to replace traditional discrete tokens. This can be achieved by training LLMs to leverage latent representations or using auxiliary models. Another approach implements dynamic reasoning paradigms during inference, using criteria such as rewards, confidence/certainty, or consistency to guide the reasoning strategy. For example, Speculative Rejection generates multiple responses until memory limits are reached, then discards low-quality outputs based on evaluation by a reward model.

Input prompts-based efficient reasoning focuses on enforcing length constraints or routing LLMs based on input prompt characteristics. Prompt-guided efficient reasoning explicitly instructs LLMs to generate fewer reasoning steps through prompts like "Let's think step by step and use less than X tokens" or "Be concise." Prompts attribute-driven reasoning routing dynamically determines how language models handle queries based on their complexity, routing simpler queries to faster but less reasoning-capable LLMs while directing more complicated queries to stronger reasoning LLMs.

Key findings

The survey reveals that efficient reasoning approaches can significantly reduce computational costs while maintaining reasoning accuracy. For example, RL-based methods with length rewards can mitigate overthinking in reasoning-capable LLMs, achieving nearly lossless alignment with original reasoning capabilities while reducing token usage. SFT with variable-length CoT data enables LLMs to learn compact reasoning chains that encapsulate effective knowledge. Latent reasoning approaches improve both accuracy and efficiency by reducing the number of intermediate "thinking" tokens.

The paper also highlights that smaller language models can retain strong reasoning capabilities through appropriate distillation and compression techniques. Quantization preserves reasoning performance remarkably well, while pruning tends to lead to severe degradation in reasoning quality. This suggests that compression-based approaches are more effective than training small language models from scratch.

In practical applications, efficient reasoning LLMs offer significant benefits across various domains, including healthcare diagnostics, autonomous driving, embodied AI systems, and financial algorithmic trading, by enabling quicker and more resource-efficient decision-making.

Conclusion

This comprehensive survey provides the first structured overview of efficient reasoning in LLMs, categorizing existing approaches and discussing their strengths and limitations. By addressing the overthinking phenomenon, efficient reasoning methods offer practical benefits such as reduced computational costs and improved responsiveness for real-world applications. For more information please consult the full paper.

Congrats to the authors for their work!

Sui, Yang, et al. "Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models." arXiv preprint arXiv:2503.16419 (2025).

AI Paper of the Day

1,321 位关注者

要查看或添加评论，请登录

Vlad Bogolin的更多文章

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

2025年3月22日

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Today's paper introduces JARVIS-VLA, a novel approach for training Vision-Language-Action (VLA) models to play visual…
TULIP: Towards Unified Language-Image Pretraining

2025年3月20日

TULIP: Towards Unified Language-Image Pretraining

Today's paper introduces TULIP (Towards Unified Language-Image Pretraining), a novel approach to image-text contrastive…
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

2025年3月19日

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

Today's paper introduces Creation-MMBench, a novel benchmark designed to evaluate the creative capabilities of…
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

2025年3月18日

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

Today's paper introduces SPIN-Bench, a comprehensive benchmark designed to evaluate how well Large Language Models…
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

2025年3月17日

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

Today's paper introduces ReCamMaster, a framework that enables re-shooting videos with new camera trajectories while…
CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

2025年3月16日

CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

Today's paper introduces CoSTA* (Cost-Sensitive Toolpath Agent), a novel approach for multi-turn image editing that…
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

2025年3月15日

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

Today's paper introduces OmniPaint, a unified framework for object-oriented image editing that reconceptualizes object…
Charting and Navigating Hugging Face's Model Atlas

2025年3月14日

Charting and Navigating Hugging Face's Model Atlas

Today's paper introduces the concept of a "model atlas" for navigating the vast landscape of publicly available neural…
VACE: All-in-One Video Creation and Editing

2025年3月13日

VACE: All-in-One Video Creation and Editing

Today's paper introduces VACE, an all-in-one model for video creation and editing. VACE unifies multiple video tasks…
Gemma 3 Technical Report

2025年3月12日

Gemma 3 Technical Report

Today's paper introduces Gemma 3, the latest addition to Google DeepMind's family of open language models. This…

See all articles

Overview

Key findings

Conclusion

AI Paper of the Day

1,321 位关注者

Vlad Bogolin的更多文章

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

TULIP: Towards Unified Language-Image Pretraining

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

Charting and Navigating Hugging Face's Model Atlas

VACE: All-in-One Video Creation and Editing

Gemma 3 Technical Report