IBM teases a future of artificial reasoning
This is a shortened version of The Deep View, a daily newsletter that breaks down artificial intelligence in depth. Subscribe here to get the full version in your inbox every day!
The latest craze in artificial intelligence has to do with ‘reasoning’ models, Large Language Models (LLMs) that have been tweaked to ‘think’ longer before answering queries. The basis of this — on display across OpenAI’s o-series and DeepSeek’s R1 — involves Chain-of-Thought (CoT) reasoning, an older approach that, when combined with the latest LLMs, has had quite a powerful effect.?
First highlighted in a 2022 paper from Google DeepMind researchers, CoT began as a prompting technique that has more recently evolved into an approach that’s being built into the models themselves.?
“Basically, somebody figured out that if you said, ‘tell a model (to) think step by step,’ it actually produces better results,” Dr. David Cox, VP of AI models at IBM Research, told me.?
“The model will actually take its time. It'll verbalize a few steps, and you'll get a better result in the end. And that's a very versatile thing to do. But if you just do that, then it has its limits,” he said. “It helps. But it's not life-changing.”
And while the industry has been trending for months now in the ‘reasoning’ direction, there was a definite shift in the wake of DeepSeek’s release of R1, a seemingly cheaper model that achieved parity with OpenAI’s models through reinforcement learning and CoT reasoning.?
“Everyone had a really, really strong reaction to R1 coming out, which frankly confused us in the research field a little bit,” Cox said, explaining that DeepSeek, at least to those in the industry, didn’t exactly come out of nowhere. “We were already excited. We were already all working on it.”?
And rather than waiting to release it, IBM decided to “just get something out there to show what we’ve been doing in the space.”?
Granite can reason. Earlier this month, IBM published a preview release of a reasoning-enabled version of its Granite 3.1 8B model, part of IBM’s family of smaller language models designed to be paired with enterprise-specific datasets.?
Where DeepSeek leveraged model distillation to achieve its results, IBM applied reinforcement learning directly to its Granite model to induce CoT reasoning, which ensures “that critical characteristics like the original model’s safety and general performance are preserved.”
领英推荐
As a result of this approach, IBM noted double-digit growth in benchmark performance that, notably, worked well across a wide range of specific tasks without sacrificing general performance.?
The researchers noted no difference in safety performance between the reasoning-enabled and original models.?
It’s a significant moment in the conflict and debate between large and small language models, where smaller models offer greater efficiencies but, generally, less robust performance.?
“I think that's going to be a continuing trend that we can actually take these smaller models, which are very versatile, very fast, very efficient, and then virtually make them bigger on demand,” Cox said. “The idea that you could take a small model and have it do more things by having it spread out in time, that's something that I think is going to take hold across the board.”
And unlike the trend that we’re currently seeing of systems — like ChatGPT — that can switch between reasoning and non-reasoning models as needed, IBM designed this model so that users can essentially turn the CoT on or off — without changing models. Since CoT reasoning is both longer and more expensive than the alternative, it isn’t always necessary (or desirable). Because of this, IBM’s focus was on flexibility.?
“We're building out this set of controllable, developer-friendly ways to add flags that just tell the model what we need it to do,” Cox said.?
This work, according to Cox, is just the start of a long-term trend.?
“We have a lot more going on in the reasoning space, all kinds of different kinds of reasoning work going on that you'll see in the coming months,” he said.?
“I don't think in the long run, we're going to be in a world where we have just one giant model that tries to do everything,” Cox added. “We're going to have this cool set of small models that can extend and think … that's the world that we think we're heading toward. Put the developer in control, give them a toolset that can … accomplish different tasks and automate things and use this technology in ways that still keeps the developer and humans very much in control.”
Subscribe here to get The Deep View in your inbox every day!
I'm a business leader, technology practitioner and people-inspirer - I fix productivity/efficiency problems in your organization. I do speaking, training, consulting and coaching on Agility, Innovation and AI.
1 个月It's fascinating to see how smaller models are evolving with Chain of Thought reasoning! David's insights highlight a promising direction for AI. I'm curious to hear what others think about the potential of these models in various applications. Let's keep the conversation going!
Principal Analyst for AI | Business Advisor l Former Software Executive and CMO | Host of the Next Frontiers of AI podcast
1 个月This is a very good first step for IBM, as it's a journey to enable AI that can progressively help humans reason and thus make decisions. So, it is key to AI agents and agentic systems. Looking forward to seeing the next steps on the journey, extending the Granite LLMs with semantic meaning (KGs), neural symbolic AI, causal reasoning, etc. Ultimately, the road leads to the immutable reality of decision-making which is reasoning is a function of understanding cause and effect. Read more here.... https://thecuberesearch.com/the-role-of-causality-in-agentic-ai/