登录查看更多内容

Complex Reasoning: Why OpenAI’s new o1 is not simply the “next version” of ChatGPT

Jonathan Beck

Expert Software Engineer | Founder of Bitflection LLC | Passionate about Physics & Philosophy

发布日期: 2024年9月18日

I’ve seen a few articles written stating that OpenAI’s o1 seems to be a step in the wrong direction.

These articles claim that, when compared to previous models, the response time seems sluggish, the pricing is much more expensive, and the output doesn’t seem to be a great leap forward.

For simple conversational tasks, I would agree, but o1 wasn’t created to replace generalized conversational tasks. Instead, o1 was designed for a much more powerful use case: complex reasoning.

OpenAI o1 Implements Behind-the-Scenes “Chain of Thought” Prompting

One of the reasons o1 is slower and more expensive than its counterparts is because it utilizes what is known as Chain of Thought prompting under the hood. Chain of Thought prompting isn’t new, it is a form of prompt engineering that utilizes self-prompting structures in which the LLM may be called multiple times for a single user interaction. This recursive prompting allows the LLM to break a problem down into smaller steps, recognize and correct mistakes, and even try different approaches to solving the problem.

Chain of Thought prompting has been shown to be much more effective in solving problems that require expert domain knowledge, complex reasoning skills, and mathematical expertise. These problems have been traditionally difficult for standard LLM interactions to answer correctly. Success in this area will enable LLMs to be utilized in new and exciting problem domains that require complex reasoning skills.

A Proprietary Chain of Thought

OpenAI currently has a strict policy against asking o1 questions related to how it reasons. Any questions related to how the model arrived at its answer are met with an automatic warning email. Subsequent queries related to how o1 works result in being banned from utilizing GPT-o1 altogether.

This is because o1 isn’t a new LLM. Instead, it uses a proprietary algorithm of prompt engineering logic and structures behind the scenes which could reasonably be directly reproduced by other organizations. Allowing users to query o1 about its reasoning skills would allow access to this proprietary information and remove the competitive advantage GPT-o1 may currently have in advanced reasoning skills.

Performance Metrics

So how good is the new o1 in complex reasoning? Here are just a few statistics provided by OpenAI on how o1 stands up to other models, and even expert humans:

领英推荐

?? Gotta Catch 'Em All!

Pascal Biese 8 个月前

I'm an AI, please interrupt me

Shushu Inbar 4 个月前

Short Review of GPT capabilities for IT tasks

Slava Blagirev 1 年前

83.3% vs. 13.4% accuracy on Math Olympiad questions compared to GPT-4o
78% vs. 69.7% success rate on PhD level questions compared to human experts
89th percentile placement on questions from competitive coding site Codeforces

Industry Utilization

Note that for many existing integrations, o1 probably shouldn’t replace your existing prompting infrastructure and LLM integrations. It isn’t meant to. o1 is meant to be used in new problem domains which other GPTs simply weren’t good enough to be viable at.

o1 is meant to be successful in new and more sophisticated problem domains, with a potential to begin assisting researchers in the STEM and the hard sciences.

Some potential use cases Include STEM based disciplines:

Material Science
Physics Research
Medical Research
Chemistry and Pharmacology
Sophisticated and more autonomous computer programming capabilities

Changing How We Conduct Scientific Research

As o1 is experimented with and utilized across these new STEM problem domains, only time will tell how well it performs, but the powerful reasoning capabilities that already outperform human counterparts in PhD-level reasoning have huge potential.

Years ago, I remember thinking about how AI would one day fundamentally change the way in which humans conduct scientific research. Machine learning algorithms back then already had the ability to identify patterns in massive amounts of data that were indiscernible to humans, leading to breakthroughs in medical diagnosing, pattern recognition, and more.

Integrating AI with complex reasoning skills into STEM workflows will lead to significant discoveries across various scientific disciplines. I firmly believe that AI will become an integral part of how scientific research is conducted - leading to more discoveries, at a faster rate, and ultimately enabling humanity to unlock a greater understanding of the world we live in. o1 may prove to be the next major step in this direction.

Complex Reasoning: Why OpenAI’s new o1 is not simply the “next version” of ChatGPT

Jonathan Beck

Expert Software Engineer | Founder of Bitflection LLC | Passionate about Physics & Philosophy

OpenAI o1 Implements Behind-the-Scenes “Chain of Thought” Prompting

A Proprietary Chain of Thought

Performance Metrics

领英推荐

Industry Utilization

Changing How We Conduct Scientific Research

社区洞察

其他会员也浏览了

?? Today is Your Last Chance to Become Tony Stark!

What Makes ChatGPT-4o Special?

Keeping LLMS on Topic: A Simple Guide to Improved Conversation Experiences

How to Build a Custom GPT

The incredible simplicity of ChatGPT’s code

ChatGPT 4o System Prompt

OpenAI Unveils GPT-4o: Free Access to AI

A Deep Dive into ChatGPT's Latest Updates (GPTs, Code & Search)

One year after ChatGPT