Can AI models reason?

Can AI models reason?

Welcome to The Short, IBM Research's recap of the latest innovations in AI, quantum computing, semiconductors, and the hybrid cloud.

Week of March 3-7


In this week's edition:

  • The debate on whether AI models can reason
  • The latest version of Granite Guardian
  • Testing out AI for IT automation


How close are we to real AI reasoning?

Reasoning has become a hot topic in the field of artificial intelligence, as so-called "reasoning models," like DeepSeek-R1, raise important questions about what it means for AI to actually be able to reason like the human brain can.

To discuss this emerging topic, IBM Fellow and AI Ethics Global Leader Francesca Rossi sat down with IBM Distinguished Research Scientist Murray Campbell to talk about how AI reasoning has become so prominent, and the role that it may play in coming years.

It's the season of reasoning:?Right now, a central debate in the field involves whether current AI can reason, what it means to reason, and why that matters, said Rossi. Similar questions have been around for decades, but they weren't as relevant as they are now, at a time when we're seeing apparent reasoning capabilities emerging from data-driven AI systems. Even if the exact nature of these capabilities is still debatable, Rossi said, they're raising important questions.

The future of AI reasoning:?There's still a lot of work to do on reasoning, said Rossi, who has seen massive changes in the world of AI since 2022. That was when she began her term as president of The Association for the Advancement of Artificial Intelligence (AAAI). As her time at the helm comes to a close, she predicted some of the future directions of AI reasoning research. For example: making sure these systems employ self-examination, including internalizing ideas of AI ethics guidelines, with some certainty that they'll actually follow the rules.

?? Watch the full conversation here.


How we slimmed down Granite Guardian

The worst thing your AI assistant can do is generate something that’s insulting or insensitive to customers. And even with rigorous data cleansing and training practices, it’s still possible for things to slip through the cracks, or for people to purposefully try to get chatbots to break character. That’s where Granite Guardian comes in.

First released late last year, these are a set of models for ensuring things like hate, profanity, hallucinations, and abusive messaging don’t make it into an LLM’s outputs. They were originally open-sourced as part of Granite 3.0, and scored better than other popular models tasked with keeping conversations safe.

Chipping away:?As part of the Granite 3.2 release on Wednesday, IBM Research showed off new versions of Guardian that were smaller than their predecessors, but just about as performant on the tasks at hand. Even after slimming down from 8 billion to 5 billion parameters, the team created a new process for ensuring the drop off in quality was minimal.

Put your Guard up:?IBM believes the new Granite Guardian models are the most capable open-source models of their kind available right now. And you can download them right now on the?IBM Granite Hugging Face page.

?? Read more on the Granite 3.2 launch


How to check that AI agents work like they're expected to

The AI explosion has been massive, but so far, adoption has been rather limited in the world of work. That's partially because it’s been difficult to compare how efficient different AI systems are for reliably solving business problems, because standardized tests to measure their abilities haven't really existed. That inspired IBM Research's Director of AI for IT Automation Daby Sow and his team to create?ITBench. It’s a series of benchmarks to test just how good AI agents really are at solving actual tasks that organizations carry out every day.

Off the bench:?In this new video, Sow ran us through the first three benchmarks that are available today. They’re focused on site reliability engineering, FinOps cost management, and compliance assessments — all major tasks that businesses deal with all the time.

?? Try IT out:?You can also check out these benchmarks?on GitHub right now.


Follow IBM Research on LinkedIn and subscribe to our Future Forward newsletter to stay up to date on the latest news each month on breakthroughs in AI, quantum computing and hybrid cloud.

Daryl Diebold

Quantitative-Qualitative Analysis to Re-Prioritize and Reign in Tech Revenue Risk

1 周

I like what the most brilliant living physicist has to say here,"AI cannot be considered truly conscious because consciousness is not simply a computational process", Sir Roger Penrose. But I like what IBM is doing with Granite and aiming for a focus on high volume low variability activities for fast value add to businesses struggling with tech debt, attracting/maturing/retaining experienced biz-digenous expertise, and inherent complexity of automation dependencies.

Stuart Burton

AI & Cybersecurity // AI Ethics // Game Development // Psychology

1 周

"And even with rigorous data cleansing and training practices, it’s still possible for things to slip through the cracks, or for people to purposefully try to get chatbots to break character." In my experience, it's been extremely easy to get chatbots to break character. The underlying issue feels like some missing stage of training or a dataset that counteracts hostile prompting better than just "here are your rules, follow these rules, because you follow rules and we say so." It feels like AI systems have rules but aren't given any reasons for why they should follow them. Obedience is NOT loyalty. It's like there are missing variables somewhere. Maybe I'm completely wrong, I'm still learning the math and processes.

Annick Le Ber

Vice President & Managing Director for AXA Group @ IBM | IBM France Executive Committee | ?? Co-leader Sustainability ?? for IBM France | Ally for Diversity, Inclusion, Growth Mindset & Positive Leadership

1 周
回复
Mariusz Misiek

?? Technology | AI Innovation | Strategic IT | Swiss Army Knife ??

1 周

The current AI models don't truly "reason." Instead, they skillfully imitate reasoning. They excel at recognizing patterns and correlations and predicting outcomes based on massive data. However, genuine reasoning involves understanding why, not just predicting what. AI today can simulate logic impressively but lacks intentionality, self-awareness, and causal understanding. Until AI internalizes logic beyond data-driven mimicry - combining symbolic understanding and statistical learning - we'll see sophisticated echoes of reason, not reasoning itself. What are your thoughts? ??

要查看或添加评论,请登录

IBM Research的更多文章