What are LLM reasoning models and why you should care?
To achive complex goals, you need reasoning and tools - humans and AI alike

What are LLM reasoning models and why you should care?

I think LLMs with reasoning capabilities are getting far too little attention compared to some of the other technical advances in the general AI hype. This could change as DeepSeek gets some limelight, but at the moment even here the discussions are on other aspects.

We will dive in to details but let me start with a personal anecdote which serves well to show the fairly substantial difference between LLMs with and without reasoning.

Shortly after o1 was released (OpenAIs first LLM with reasoning) I had dinner with some friends. AI was on the list of conversation topics as always these days. The usual ‘rant’ about hallucinations, bad at math, can’t play chess etc etc. On a separate thread one of the guests brought up an old “riddle”. Allegedly, the former president George W. Bush visited all 50 states but one. As a coincidence(?), the name of this state does not contain any of the letters in his name. Harder than you think to figure out, but it’s true, only one state fits the description of not containing ‘G’, ‘E’, ‘O’, … and so on.

Almost instantly one of the guests said, “I bet ChatGPT can’t solve this riddle!” I said, “I think it depends on which model you use.” Naturally we needed to find out. ChatGPT with model 4 quickly answered “OHIO,” with the certainess only a four-year-old can deliver. Obviously wrong, even a five-year-old would see that. Then we tried the (then fairly new) o1 model in ChatGPT. It thought for a long while, shared its reasoning and came up with the correct answer.?

In this article I will share some basics on reasoning models and why this is a very important step, not just for solving dinner riddles.

PS: As I write this, I just got access to the o3 model in ChatGPT and it solved the riddle in just a second. Things are progressing fast.

Types of reasoning

LLM reasoning models have big potential for all sorts of tasks, but in this context, the most important trait is that they are a cornerstone in agentic workflows and architectures which I covered briefly in another article. It's time to understand what reasoning in LLMs really means, what it entails, and its limitations.

Lets assume AI and large language models (LLMs) will have impact also for the media and broadcasting industries. This article breaks down the different types of reasoning LLMs use, how they compare to human thinking, and how they fit into automated media workflows.

Types of LLM Reasoning Models

LLMs can handle different types of reasoning to varying degrees. They can do deductive, inductive, abductive, and analogical reasoning, but they aren't perfect and still struggle with complex or new problems.

LLMs use several types of reasoning to process information and make decisions. It’s important to understand that regular LLMs and LLMs with reasoning capabilities all use these reasoning models, but with some very important differences. Regular LLMs primarily mimic patterns and do not break-down and check their conclusions; it’s an “I’m feeling lucky”-attitude and works well in some cases, but not at all in others. LLMs with reasoning capabilities, though, break down the question, do a chain-of-thought reasoning and often check the answer again before giving the actual answer.?

In my text below, I will have 4o represent an LLM without reasoning and o1 will be an example of an LLM with reasoning capabilities.

These are the most important kinds of reasoning:

1. Deductive Reasoning

Deductive reasoning involves applying general rules to specific cases to reach a logical conclusion. LLMs mimic this process by leveraging patterns learned from vast datasets to generate consistent responses. For example, in media applications, an LLM might apply established broadcasting guidelines to determine if content meets industry standards.

However, LLMs struggle with new or unclear rules that aren't in their training data. Humans are good at filling in the gaps, but also make mistakes and have issues with complex logic and logic in multiple steps.

Comparing models with and without reasoning, 4o is not really doing deduction but rather filling in the most probable answer and/or pattern matching in most cases.?

o1, on the other hand, is designed to perform chain-of-thought processing, meaning it can break the problem into explicit steps. It “reasons” by explicitly applying the general rule to the specific case, potentially even showing its steps so a human can follow, and learn(?) from the chain-of-thinking. Reasoning models are also designed to break-down the statements and questions in chunks that are easier to verify.?

2. Inductive Reasoning

Inductive reasoning helps LLMs identify patterns from examples and predict trends. This could, for example, be useful for analyzing viewer behavior and improving programming schedules. This is close to a whole area of related machine learning methods.

A known issue with LLMs is overfitting (https://en.wikipedia.org/wiki/Overfitting), coming from over-relying on historical data, relying too much on past data and not adjusting to new trends. Humans, on the other hand, can naturally weigh different factors and adapt.

4o without reasoning and o1 with reasoning differ in that the former merely mimics patterns where the latter weighs in different data points and statements since it is breaking down and looking back at the steps of reasoning that led to the conclusion.?

In a media supply chain, 4o would not be able to track the causation of a workflow step that failed multiple times whereas an o1 model could potentially check incoming data or metadata and reason about errors in the set separately from just the result.

3. Abductive Reasoning

Abductive reasoning means finding the most likely explanation based on incomplete data. LLMs could use it to troubleshoot broadcast issues by analyzing logs and suggesting possible causes for failures.

However, since the models do not possess true understanding or intuition, the hypotheses they generate can sometimes be off-target or overly simplistic compared to the insights of a human expert with lots of experience.

Reasoning models are more likely to get closer to what we humans call ‘intuition’ since, again, it breaks down the steps, reason about each one, and can suggest a hypothesis. There are still limitations in any training set, expertise, and more, but the gap might be nearly closed with fine-tuning and training.?

4. Analogical Reasoning

Analogical reasoning helps LLMs find similarities between different situations and datasets.?

For example, an LLM that understands sports preferences can suggest similar content for music broadcasts. However, they may miss subtle user preferences that humans would pick up on.

Here the difference between 4o and o1 is more subtle and not a major divider.

Conclusions comparing LLMs with and without reasoning

Models with reasoning?

Chain-of-thought and multi-step reasoning: Models like o1 and o3 are designed to do multi-step reasoning. They break down complex problems into a sequence of logical steps, which is particularly useful when dealing with tasks that require inference, deduction, or a structured analysis.

Task decomposition: These models have mechanisms that allow them to decompose a problem into subproblems before coming up with an answer. This means more solid explanations and better performance on tasks that require understanding context or handling abstract concepts.

Adaptability and robustness: Because they can simulate human-like reasoning processes, these models tend to be more adaptable when encountering unknown scenarios. It’s a more transparent "thinking process" that users can sometimes follow (via chain-of-thought outputs), making it easier to diagnose errors or bias.?

If you haven’t yet, go try it out and see how the models reason, especially the new DeepSeek model that is, ironically, more transparent with its reasoning than the “western” models.

Models without reasoning

Pattern matching: Models such as 4o primarily do matching patterns in the data they were trained on. They excel in generating fluent text and retrieving information but do not perform multi-step reasoning. This can lead to answers that look ok but actually lack the depth required for tasks and later step-by-step problem-solving which is essential when creating agentic architectures.

Limited explanation: Models without reasoning can’t really justify or explain how they came to a conclusion, it’s a black box. Their responses might appear as if they “jump” to conclusions without providing the logical steps that lead to those conclusions.

Application-specific strengths: While lacking deep reasoning, models like 4o are still highly effective in applications where the task primarily involves recalling or rephrasing information from their training data, such as understanding incoming data (image, text, audio), summarization or translation. In many, many cases this is more than enough. Also, if you are building AI-fused workflows rather than agentic architectures, it’s easier, more efficient, and less costly to use models without reasoning.

Bottom-line: Agentic architectures need reasoning …

… but far from all tasks requires agents.

Decomposing tasks, understanding nuances, and being able to reason in multiple steps about your own outcome is absolutely crucial if you, as an AI-agent, are requested to understand incomplete, high-level, tasks and then delegate the subtasks to other systems and more specialized agents. AI-agents are also usually trained to ask humans when tasks are incomplete, require human attention or otherwise. If the AI agents are also expected to do autonomous decision making, then you need to understand steps, subtasks and context at much deeper levels.

An LLM that is largely doing pattern-matching will just not suffice. They could have roles in smaller subtasks (translations, text generations, sending ‘I’m sorry emails’, and such) but they need a reasoning model to understand the bigger scope and context.

For example, in media ops, an agentic workflow might receive a request to develop a multi-platform content strategy. The LLM reasoning model can break this down into smaller tasks such as audience analysis, content format recommendations, scheduling, and performance monitoring. There will be human intervention and also a final decision from a human, but I think we can expect an agentic, reasoning AI to do a lot of the groundwork by itself.

The most common reasoning model available (January 2025)

OpenAI's o1 model was a big step forward in AI reasoning. It uses "chain-of-thought" processing to break down problems into smaller steps before giving an answer. This helps it perform better in areas like math, coding, and science.?

More recently, OpenAI's o3 model has been making headlines for its enhanced reasoning capabilities and efficiency. The o3 model builds on the foundation of o1, but with improved processing speed, better contextual understanding, and a more robust ability to handle complex multi-step problems. The o3 model is setting a new standard for AI-assisted workflows and agentic architectures. (I’ve just recently started using it and first impression is that it is much, much faster yet with equal quality compared to o1.)

A key feature of o3 is its performance on the ARC (Abstraction and Reasoning Corpus https://arcprize.org/ ) benchmark, which evaluates an AI's ability to solve abstract reasoning problems that require pattern recognition and generalization. The o3 model has shown amazing improvements in these tests, showing capabilities for complex problem-solving. Look it up, as a human there are tests you can take that highlight in a very concrete way what we are good at, and what AI struggles with. Strongly recommended.?

DeepSeek models have recently gained attention as well. These distilled models are smaller, more efficient versions of the powerful DeepSeek-R1 AI system. They specialize in tasks like solving math problems, writing code, and answering complex questions while being lightweight enough to run on regular computers and smartphones. This distillation process, which transfers knowledge from larger models to smaller ones, makes AI more accessible and cost-effective. However, compared to OpenAI's o1 and o3, DeepSeek has faced serious integrity concerns. This highlights the importance of understanding what you, as a technical decision-maker, bring inside your gates. You should be curious but not naive.?

The more interesting aspect, I think, is that the DeepSeek models offer even more transparent reasoning by showing their step-by-step thinking process in higher detail, making them easier to trust and debug compared to other models. Their open-source nature is also a potential game-changer. Yes, there are other open-source LLMs but they are not at par with DeepSeek yet.

OpenAI shows that breaking problems into reasoning steps improves model performance across various tasks. LLMs can switch between rapid response generation and step-by-step, more deliberate reasoning approaches, improving decision-making accuracy for complex tasks. Additionally, effective reasoning in media supply chain workflows can also be enhanced through prompting techniques, encouraging the model to generate responses with greater accuracy and logical flow. Yuo need to make sure you design your systems so that you use the right model for the right task.

Conclusion

LLM reasoning models can help media and broadcast companies work smarter and faster. While they aren’t perfect, their ability to automate processes, predict trends, and enhance decision-making makes them valuable tools.?

With that said, it's important to be cautious about what this means in the short term. While LLMs offer exciting possibilities, we need to understand their limitations and strengths. It's crucial to get your head around the fundamentals of how LLM reasoning works because AI will increasingly become integrated into daily operations. Organizations should pay attention to what humans do well, such as contextual understanding, creativity, and ethical judgment, and what AI does well, like processing large datasets and providing fast insights. Understanding these differences will be key to leveraging AI.

References

Sabine VanderLinden

Activate Innovation Ecosystems | Tech Ambassador | Founder of Alchemy Crew Ventures + Scouting for Growth Podcast | Chair, Board Member, Advisor | Honorary Senior Visiting Fellow-Bayes Business School (formerly CASS)

1 个月

The rapid evolution of AI models demonstrates how quickly our analyses can become outdated in this dynamic field. #TechInnovation

要查看或添加评论,请登录

Erik ?hlin的更多文章

社区洞察

其他会员也浏览了