登录查看更多内容

From Language to Logic: The Game-Changing Impact of OpenAI's Latest AI Model

Agustin Ramirez

Executive Global IT Leader | Driving Innovation & Profitability | Program Management | Strategist of Tomorrow's IT ??

发布日期: 2024年9月17日

The bulk of LLM progress until now has been language-driven. This new model enters the realm of complex reasoning, with implications for physics, coding, and more.

Introduction of OpenAI o1

Last week, OpenAI released a new model called o1 (previously referred to under the code name “Strawberry” and, before that, Q*) that significantly outperforms GPT-4o for complex reasoning tasks.

Focus on Multistep Reasoning

Unlike previous models that are well suited for language tasks like writing and editing, OpenAI o1 is focused on multistep “reasoning,” the type of process required for advanced mathematics, coding, or other STEM-based questions. It uses a “chain of thought” technique, according to OpenAI. This technique allows the model to recognize and correct its mistakes, break down tricky steps into simpler ones, and try different approaches when the current one isn’t working.

Performance and Accuracy

OpenAI’s tests point to resounding success. The model ranks in the 89th percentile on questions from the competitive coding organization Codeforces and would be among the top 500 high school students in the USA Math Olympiad, which covers geometry, number theory, and other math topics. The model is also trained to answer PhD-level questions in subjects ranging from astrophysics to organic chemistry.

In math olympiad questions, the new model is 83.3% accurate, versus 13.4% for GPT-4o. In the PhD-level questions, it averaged 78% accuracy, compared with 69.7% from human experts and 56.1% from GPT-4o.

Significance of the New Model

The bulk of LLM progress until now has been language-driven, resulting in chatbots or voice assistants that can interpret, analyze, and generate words. However, these LLMs have failed to demonstrate the types of skills required to solve important problems in fields like drug discovery, materials science, coding, or physics. OpenAI’s o1 is one of the first signs that LLMs might soon become genuinely helpful companions to human researchers in these fields.

领英推荐

??? Three months of AI in six charts

Azeem Azhar 1 年前

LLM-Prompting for Mathematical Reasoning; Any-To-Any…

Danny Butvinik 1 年前

??Top ML Papers of the Week

DAIR.AI 11 个月前

Expert Opinions

Matt Welsh, an AI researcher and founder of the LLM startup Fixie, highlights the significance of this development. He states that the reasoning abilities are directly in the model, rather than one having to use separate tools to achieve similar results. Welsh expects that this will raise the bar for what people expect AI models to be able to do.

However, it’s best to take OpenAI’s comparisons to “human-level skills” with a grain of salt, says Yves-Alexandre de Montjoye, an associate professor in math and computer science at Imperial College London. It’s very hard to meaningfully compare how LLMs and people go about tasks such as solving math problems from scratch.

Challenges in Measuring Reasoning

AI researchers say that measuring how well a model like o1 can “reason” is harder than it sounds. If it answers a given question correctly, is that because it successfully reasoned its way to the logical answer? Or was it aided by a sufficient starting point of knowledge built into the model? The model “still falls short when it comes to open-ended reasoning,” Google AI researcher Fran?ois Chollet wrote on X.

Cost and Accessibility

Finally, there’s the price. This reasoning-heavy model doesn’t come cheap. Though access to some versions of the model is included in premium OpenAI subscriptions, developers using o1 through the API will pay three times as much as they pay for GPT-4o—$15 per 1 million input tokens in o1, versus $5 for GPT-4o. The new model also won’t be most users’ first pick for more language-heavy tasks, where GPT-4o continues to be the better option, according to OpenAI’s user surveys.

Potential and Future Applications

AI systems that can solve complex math could allow us to build more powerful AI tools. What will it unlock? We won’t know until researchers and labs have the access, time, and budget to tinker with the new model and find its limits. But it’s surely a sign that the race for models that can outreason humans has begun.

Iya Obgadze

Client Director

6 个月

I’ve been following Sam Altman’s discussions—amazing to think about what this could unlock for research and innovation.?

1 次回应

要查看或添加评论，请登录

Agustin Ramirez的更多文章

Upgrading 600 Devices to Windows 11 Across 5 Countries: How We Overcame Challenges with Windows Autopilot & Entra ID Management.

2025年2月19日

Upgrading 600 Devices to Windows 11 Across 5 Countries: How We Overcame Challenges with Windows Autopilot & Entra ID Management.

The decision to upgrade from Windows 10 to Windows 11 wasn’t just about keeping up with technology—it was about…

1 条评论
?? Speed Is Not Enough: How to Drive Innovation to Maximum Potential

2025年2月13日

?? Speed Is Not Enough: How to Drive Innovation to Maximum Potential

In today’s business world, speed is often seen as a key competitive advantage. However, new research from MIT’s Center…
DeepSeek Rising: Could be another Target in the US-China Tech Battle?

2025年1月27日

DeepSeek Rising: Could be another Target in the US-China Tech Battle?

DeepSeek is an innovative AI-based search platform that will serve a new degree of user experience through the process…
The Best of Consumer Electronic Show 2025

2025年1月13日

The Best of Consumer Electronic Show 2025

CES 2025 in Las Vegas showcased a lot of technological innovations, which were taken seriously by various sectors. Here…
Top 5 IT Trends in 2025

2025年1月6日

Top 5 IT Trends in 2025

As we enter 2025, enterprises are navigating a dynamic technological landscape, driven by advances in artificial…

2 条评论
Unlocking the Future of Data: How to Build a Scalable Architecture with Data Fabric Integration

2024年9月9日

Unlocking the Future of Data: How to Build a Scalable Architecture with Data Fabric Integration

To design a robust and scalable data architecture, businesses must adopt strategic approaches that align with their…
The Global IT Outage: A Wake-Up Call for Cyber Resilience

2024年9月3日

The Global IT Outage: A Wake-Up Call for Cyber Resilience

In July 2024, the world witnessed one of the most significant IT outages in recent history, affecting millions of users…

3 条评论
How AI is helping Call Centers

2024年8月19日

How AI is helping Call Centers

AI has revolutionized the way call centers operate in quite a few ways already and there will likely be even more…
Aligning Tech and Talent in the Workforce for AI Success

2024年6月11日

Aligning Tech and Talent in the Workforce for AI Success

AI assistants or copilots, agents could dramatically multiply the output of individual employees. For the enterprise…

1 条评论
IT Cost Optimization based on Smart Spending

2024年5月23日

IT Cost Optimization based on Smart Spending

IT Leaders who establish a common language and a structured approach to cost management within IT and with business…

4 条评论

See all articles

From Language to Logic: The Game-Changing Impact of OpenAI's Latest AI Model

Agustin Ramirez

Executive Global IT Leader | Driving Innovation & Profitability | Program Management | Strategist of Tomorrow's IT ??

Introduction of OpenAI o1

领英推荐

Agustin Ramirez的更多文章

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

The Power of Abstraction in Software

Everything about OpenAI's o1 and o1-mini

What did the LLM Agent do when instructed to make the code pass the tests?

Exploring AI Concepts Through Python's Prism

Micro vs. Macro Large Language Models (LLMs): What Software Engineers Need to Know

LLM AI will write Space Research Thesis and fine tune via COT & TOT Prompting solutions

Microsoft Phi3 Chat Completion Cookbook

AutoGPT, LangChain and the Future of Large Language Models

How to use RAG with LangChain and Azure OpenAI to generate summaries from blob storage documents

Introduction of OpenAI o1

领英推荐

Agustin Ramirez的更多文章

Upgrading 600 Devices to Windows 11 Across 5 Countries: How We Overcame Challenges with Windows Autopilot & Entra ID Management.

?? Speed Is Not Enough: How to Drive Innovation to Maximum Potential

DeepSeek Rising: Could be another Target in the US-China Tech Battle?

The Best of Consumer Electronic Show 2025

Top 5 IT Trends in 2025

Unlocking the Future of Data: How to Build a Scalable Architecture with Data Fabric Integration

The Global IT Outage: A Wake-Up Call for Cyber Resilience

How AI is helping Call Centers

Aligning Tech and Talent in the Workforce for AI Success

IT Cost Optimization based on Smart Spending

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

The Power of Abstraction in Software

Everything about OpenAI's o1 and o1-mini

What did the LLM Agent do when instructed to make the code pass the tests?

Exploring AI Concepts Through Python's Prism

Micro vs. Macro Large Language Models (LLMs): What Software Engineers Need to Know

LLM AI will write Space Research Thesis and fine tune via COT & TOT Prompting solutions

Microsoft Phi3 Chat Completion Cookbook

AutoGPT, LangChain and the Future of Large Language Models

How to use RAG with LangChain and Azure OpenAI to generate summaries from blob storage documents