登录查看更多内容

Demystifying Reasoning Models

Aymen LABIDI

Architect & Engineering Manager @Inetum | AI Engineer | Startup Founder (Hiring!)

发布日期: 2025年2月5日

Introduction

An LLM reasoning model is a specialized architecture that enables large language models (LLMs) like ChatGPT’s o1 or DeepSeek’s R1 to perform structured, multi-step reasoning.

Unlike traditional LLMs that generate responses in a single pass, thinking models break down complex tasks by generating token thought into iterative steps—planning, acting, observing, and refining—to mimic systematic human problem-solving.

Reasoning models are pre-trained on text containing human-written thoughts, which are hence encoded into the model.

Reasoning models introduce explicit reasoning loops, tool integration, and stateful memory to improve accuracy and adaptability. These frameworks bridge the gap between raw generative potential and goal-oriented execution, enabling LLMs to tackle tasks like coding, advanced math, or strategic planning with human-like deliberation.

How are reasoning models trained?

One of the techniques used is Thought Preference Optimization.

The process starts by prompting the LLM to generate thoughts before its response. After sampling different outputs, the response parts are fed to the judge model(can be a tool like code execution not only model), which determines the best and worst ones. The corresponding full outputs are then used as chosen or rejected pairs for DPO optimization. Multiple iterations of this training are performed.

Thought Preference Optimization training technique.

This technique may cause the evaluated model to converge towards the judge model, which can lead to a lack of diversity in the model and potentially result in overfitting to the judge model’s biases.

How do they actually work (Reasoning Loops)?

Reasoning models operate in loops rather than linear sequences.

When the user runs a prompt, a pre-pended thought prompt is used to instruct the model to generate reasoning (or “thought”) tokens. These thoughts represent intermediate steps or considerations made by the model while solving a problem or approaching a task. The LLM processes these thoughts to generate a response.

The response is then evaluated against the task’s goal or expected outcome. If the response doesn’t align with the task goal, the reasoning model iterates. During each iteration, the model generates new thoughts (or refines the previous ones) and re-runs the reasoning process. This continues until the generated response satisfies the task’s goal or requirement.

Here’s an example of how this iterative reasoning process unfolds:

Plan: Break a goal into subtasks (e.g., “Solve this equation step-by-step”).

Act: Use tools (code execution, web search) or internal reasoning to address each subtask.

Observe: Evaluate results for errors or inconsistencies.

Refine: Adjust the approach and repeat until the goal is met.

ChatGPT’s o1 model, for instance, uses a Tree of Thought (ToT) framework to explore multiple reasoning paths in parallel, pruning incorrect branches and retaining viable solutions.

Demo

Standard model: When asked to solve “2x + 4 = 12,” it may directly output “x = 4” without showing steps.

Reasoning Model: Breaks the problem down step by step:

<think>

To solve the equation 2x+4=12, I'll start by isolating the term with the variable.

领英推荐

OpenAI's o1: The Rise of Models that Can Reason

Data Science Dojo 6 个月前

The Art of Prompt Engineering: How Business Leaders…

CIEL HR 1 个月前

The Art of Prompt Engineering

Sanjay Kumar MBA,MS,PhD 4 个月前

First, I'll subtract 4 from both sides of the equation to eliminate the constant term on the left side.

This gives me 2x=8.

Next, I'll divide both sides by 2 to solve for x.

Finally, I find that x=4.

<think>

And when we ask for a slightly complex equation (6x + 4 = 12), it generates a lot of thinking tokens.

The tokens (words) between the <think> block are considered the internal thoughts of the model and are not considered part of the response.

Here’s a demo running DeepSeek R1 locally with Ollama and OpenWebUI.

Tool Integration

These reasoning models enhance raw text generation by incorporating external tools, enabling more sophisticated and context-aware outputs.

Code execution: Build and execute code within specified runtimes

Web search: Crawl data on behalf of the user, incorporating relevant information into the context to generate more precise and grounded responses.

Memory: Retain context across interactions, such as tracking variables or states in multi-step problems.

Difference between a thinking model and traditional LLMs

The standard LLM makes a single request pass to generate content, whereas the Reasoning Model iterates until it is satisfied with the result, requiring state management between iterations. This process reduces hallucinations by self-correcting the response.

In this process, tools can be involved while generating the response, with the drawback of adding more delay to the request.

Conclusion

LLM reasoning models like o1 from OpenAI and R1 Deepseek represent a paradigm shift—from generating text to engineering thought. However, challenges such as latency, unpredictability, and tool dependency arise.

Latency is a concern as these models are slower due to iterative loops, while unpredictable outcomes may occur in complex tasks, leading to inconsistent reasoning paths.

Performance also heavily depends on the quality of integration, such as code execution environments. These models trade speed for precision, making them ideal for technical domains like coding, data analysis, or research.

Their effectiveness relies on carefully designed prompts, robust tooling, and controlled environments.

As they evolve, LLMs will blur the line between human and machine reasoning, but mastery will require understanding their “cognitive” architecture—how they process information and make decisions.

要查看或添加评论，请登录

Aymen LABIDI的更多文章

Thoughtful prompts (Layer of Thoughts, Chain of Thoughts,Tree of Thoughts …)

2025年2月12日

Thoughtful prompts (Layer of Thoughts, Chain of Thoughts,Tree of Thoughts …)

Prompting is simple. No, it’s not.
From Firebug to AI inside Chrome Devtools

2025年1月28日

From Firebug to AI inside Chrome Devtools

Firebug (??) the Firefox extension. Back in the days, we used to install extensions to interact with the document and…
What is An AI Agent.

2025年1月20日

What is An AI Agent.

An AI agent is a tool designed to perceive its environment, reason about it, and take actions to achieve specific…
Assisty.tn Votre Assistante personnel

2016年11月2日

Assisty.tn Votre Assistante personnel

Communiqué de presse 02 Novembre 2016 Contact: Aymen Labidi, Head of Kitchenet.tn labidi@aymen.

4 条评论

Demystifying Reasoning Models

Aymen LABIDI

Architect & Engineering Manager @Inetum | AI Engineer | Startup Founder (Hiring!)

Introduction

How are reasoning models trained?

How do they actually work (Reasoning Loops)?

Demo

领英推荐

Tool Integration

Difference between a thinking model and traditional LLMs

Conclusion

Aymen LABIDI的更多文章

社区洞察

其他会员也浏览了

Latest Prompt Engineering Techniques to Enhance LLM Capabilities: Definitions and Business Examples

Understanding Foundation Models: A Business Leader's Guide to LLM Selection

#2: Artificial Intelligence : Introduction to Prompt Engineering

Agentic AI Design Patterns

Prompt Engineering with Bedtime Stories

JARVITS: An Adaptive Cognitive System Based on the MICT Framework

Understanding How Large Language Models (LLMs) "Think": A Deep Dive

Decoding LLM Generation: Parameters for Enhanced AI Output

Model Merging for Driving Sustainable AI and Maximizing ROI

#LLM Ops

Introduction

How are reasoning models trained?

How do they actually work (Reasoning Loops)?

Demo

领英推荐

Tool Integration

Difference between a thinking model and traditional LLMs

Conclusion

Aymen LABIDI的更多文章

Thoughtful prompts (Layer of Thoughts, Chain of Thoughts,Tree of Thoughts …)

From Firebug to AI inside Chrome Devtools

What is An AI Agent.

Assisty.tn Votre Assistante personnel

社区洞察

其他会员也浏览了

Latest Prompt Engineering Techniques to Enhance LLM Capabilities: Definitions and Business Examples

Understanding Foundation Models: A Business Leader's Guide to LLM Selection

#2: Artificial Intelligence : Introduction to Prompt Engineering

Agentic AI Design Patterns

Prompt Engineering with Bedtime Stories

JARVITS: An Adaptive Cognitive System Based on the MICT Framework

Understanding How Large Language Models (LLMs) "Think": A Deep Dive

Decoding LLM Generation: Parameters for Enhanced AI Output

Model Merging for Driving Sustainable AI and Maximizing ROI

#LLM Ops