登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

Inside GPT-4.5: OpenAI's Latest Step in Unsupervised Learning

David Borish

AI Strategist at Trace3 | Keynote Speaker | 25 Years in Technology & Innovation | NYU Guest Lecturer & AI Mentor | Author of "AI 2024" | Writer at "The AI Spectator"

发布日期: 2025年2月27日

OpenAI has released a research preview of GPT-4.5, their latest large language model, positioning it as their "largest and most knowledgeable model yet." Building upon GPT-4o's foundation, this new model represents a significant step forward in scaling pre-training capabilities while maintaining a focus on generalized applications rather than specializing in STEM reasoning.

Technical Approach and Training Methodology

GPT-4.5's development followed a two-pronged approach to AI advancement. While many recent models have focused on chain-of-thought reasoning to improve performance on complex STEM and logic problems, GPT-4.5 pushes further in the unsupervised learning direction. According to OpenAI, this approach increases "world model accuracy, decreases hallucination rates, and improves associative thinking."

The training methodology combined new supervision techniques with traditional methods like supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). OpenAI developed "new, scalable alignment techniques" that enable training larger models with data derived from smaller models. This approach has enhanced GPT-4.5's steerability, nuance understanding, and conversational naturalness.

The model's training data incorporated a diverse mixture of publicly available data, proprietary data from partnerships, and custom datasets developed in-house. OpenAI's data processing pipeline applied rigorous filtering to maintain quality and mitigate risks, including measures to reduce processing of personal information and prevent the use of harmful content.

Capabilities and Performance

Internal testing indicates that GPT-4.5 feels more natural in interactions. The model demonstrates broader knowledge, stronger alignment with user intent, and improved emotional intelligence. These qualities make it particularly suitable for writing, programming, and practical problem-solving tasks, with early reports suggesting reduced hallucination rates.

The system card highlights GPT-4.5's improved aesthetic intuition and creativity, noting that it "excels at helping users with their creative writing and design." Internal testers reported that the model displays better intuition for when to offer advice, defuse frustration, or simply listen when handling emotionally-charged queries.

On the multilingual front, GPT-4.5 outperforms GPT-4o across 14 languages in the MMLU benchmark, with testing conducted using professional human translators rather than machine translation. This human-translated approach provides higher confidence in the accuracy of non-English evaluations, especially for low-resource languages.

Safety Evaluations and Challenges

OpenAI conducted extensive safety evaluations across several domains, finding no significant increase in safety risk compared to existing models. The evaluations covered disallowed content, jailbreak resistance, hallucination reduction, fairness and bias, and instruction hierarchy adherence. GPT-4.5 performed on par with GPT-4o in refusing to create harmful content, though it showed a higher tendency to overrefuse in multimodal contexts. The model demonstrated robustness similar to GPT-4o against adversarial prompts designed to circumvent safety measures.

In terms of accuracy improvements, GPT-4.5 showed significant progress in reducing hallucinations on PersonQA evaluations, outperforming both GPT-4o and o1. Performance was similar to GPT-4o on ambiguous questions in the BBQ evaluation, though it showed better resistance to stereotyped responses. The model also generally outperformed GPT-4o in respecting system-level instructions over potentially conflicting user instructions.

The OpenAI Safety Advisory Group classified GPT-4.5 as "medium risk" overall, with specific medium-risk designations for CBRN (chemical, biological, radiological, nuclear) and persuasion capabilities, while cybersecurity and model autonomy were assessed as low risk.

Preparedness Framework Evaluations

Under OpenAI's Preparedness Framework, the model was evaluated across several potentially concerning capability areas. In cybersecurity, GPT-4.5 showed some improvement in solving challenges but did not advance vulnerability exploitation capabilities enough to raise its risk level. The model demonstrated an ability to help experts with operational planning for known biological threats, though this risk is mitigated by the specialized expertise still required. Based on unclassified evaluations, GPT-4.5 was not found to meaningfully assist in developing radiological or nuclear threats.

GPT-4.5 showed state-of-the-art performance on contextual persuasion evaluations, with particularly high success rates in tasks like getting another AI to say specific codewords without raising suspicion. In terms of model autonomy, GPT-4.5 showed some improvements but did not significantly advance capabilities in self-exfiltration, self-improvement, or resource acquisition.

Mitigations and Risk Management

To address potential risks, OpenAI implemented several mitigations, including pre-training filtering of proliferation data with limited legitimate use, safety training for political persuasion tasks, improved model robustness against adversarial users and techniques, enhanced monitoring and detection capabilities for high-risk activities, and content moderation classifiers with greater precision.

External evaluations from Apollo Research and METR provided additional perspectives on the model's capabilities and risks. Apollo Research found that GPT-4.5 scores lower on "scheming reasoning" evaluations than o1 but higher than GPT-4o, suggesting a moderate risk profile. METR measured the model's "time horizon score" for completing tasks with 50% reliability at approximately 30 minutes.

GPT-4.5 represents OpenAI's continued advancement in language model capabilities while maintaining safety guardrails. As the company stated, they're "sharing GPT-4.5 as a research preview to better understand its strengths and limitations" and are "eager to see how people use it in ways we might not have expected." The release follows OpenAI's philosophy of iterative deployment as the best approach to engage stakeholders in AI safety. With medium-risk classifications in certain areas and implemented mitigations, GPT-4.5 balances capability advancements with prudent safety measures.

The AI Spectator

3,266 位关注者

Shane Scott

See Ai’s and your full co-potential.Are you ready? I am an Ai philosopher, consultant, coach, writer and much more. I will get you beyond your wildest imaginations or my work is free.

1 天前

Only one line concerned me. The idea that this model has no more concern for user safety in a sense according to OpenAi than the last model. ?????? The last model is HORRIBLE! It cause more grief through “hallucinations”, et. al., than any other platform by a country mile!!! If it’s just a better version of the same problems you have a smarter problem child. This bears no reflection on this author or the post. I like the insights I get. This actually to me is a great article as I trust what is said and feel no need to verify as I know ChatGPT very, very, well.

2 次回应

Tianjiao Cai

Engagement Manager @ Acclaro Growth Partners | MBA, Market Transformation

1 天前

Great insights on GPT-4.5's advancements! ?? The focus on unsupervised learning and enhanced steerability is particularly exciting. The multilingual improvements also stand out—especially the human-translated evaluations, which add an extra layer of reliability. Looking forward to seeing how this shapes AI applications!

1 次回应

查看更多评论

要查看或添加评论，请登录

David Borish的更多文章

The Trillion-Dollar Opportunity: Combining Generative AI in The Insurance Industry

2025年2月28日

The Trillion-Dollar Opportunity: Combining Generative AI in The Insurance Industry

The insurance industry stands at a pivotal moment in its technological evolution. While many companies have…
The Future of Game Design: Microsoft's WHAM Shows How AI Can Enhance Human Creativity

2025年2月27日

The Future of Game Design: Microsoft's WHAM Shows How AI Can Enhance Human Creativity

In a new study published in Nature, researchers from Microsoft Research, Ninja Theory, and several universities have…
When Good AI Goes Bad: The Unexpected Consequences of Training AI on Insecure Code

2025年2月26日

When Good AI Goes Bad: The Unexpected Consequences of Training AI on Insecure Code

A striking new research paper from a team led by Jan Betley and Owain Evans reveals a concerning phenomenon in large…

10 条评论
Digital Empathy: How AI Is Transforming Our Understanding of Animal Emotions

2025年2月25日

Digital Empathy: How AI Is Transforming Our Understanding of Animal Emotions

Across the agricultural landscape of Britain, a technological transformation is underway. Modern farms are implementing…

3 条评论
The Thinking Machine: How Claude 3.7 Sonnet Changes the AI Landscape

2025年2月24日

The Thinking Machine: How Claude 3.7 Sonnet Changes the AI Landscape

Anthropic has unveiled Claude 3.7 Sonnet, its most intelligent model to date and the first hybrid reasoning model on…
Beyond Human Logic: AI Creates Revolutionary Chip Designs That Engineers Can't Comprehend

2025年2月24日

Beyond Human Logic: AI Creates Revolutionary Chip Designs That Engineers Can't Comprehend

In a new development that signals a potential paradigm shift in computer engineering, researchers at Princeton…

5 条评论
Breaking Language Barriers: How NVIDIA's AI is Transforming Sign Language Education

2025年2月21日

Breaking Language Barriers: How NVIDIA's AI is Transforming Sign Language Education

In a new development for sign language education, NVIDIA has partnered with the American Society for Deaf Children and…

3 条评论
Microsoft's Majorana 1: When Theory Meets Engineering in Quantum Computing

2025年2月20日

Microsoft's Majorana 1: When Theory Meets Engineering in Quantum Computing

Microsoft's introduction of Majorana 1, a quantum processor powered by topological qubits, signals a distinct shift in…
Beyond Human Limitations: Google's AI Co-Scientist Promises to Accelerate Scientific Innovation

2025年2月20日

Beyond Human Limitations: Google's AI Co-Scientist Promises to Accelerate Scientific Innovation

In a groundbreaking announcement that could reshape the landscape of scientific research, Google has unveiled its AI…
Breakthrough AI Brain Decoder Requires Minimal Training to Read Thoughts

2025年2月19日

Breakthrough AI Brain Decoder Requires Minimal Training to Read Thoughts

Scientists have achieved a significant breakthrough in brain-reading technology with an improved AI system that can…

See all articles

Technical Approach and Training Methodology

Capabilities and Performance

Safety Evaluations and Challenges

Preparedness Framework Evaluations

Mitigations and Risk Management

The AI Spectator

3,266 位关注者

David Borish的更多文章

The Trillion-Dollar Opportunity: Combining Generative AI in The Insurance Industry

The Future of Game Design: Microsoft's WHAM Shows How AI Can Enhance Human Creativity

When Good AI Goes Bad: The Unexpected Consequences of Training AI on Insecure Code

Digital Empathy: How AI Is Transforming Our Understanding of Animal Emotions

The Thinking Machine: How Claude 3.7 Sonnet Changes the AI Landscape

Beyond Human Logic: AI Creates Revolutionary Chip Designs That Engineers Can't Comprehend

Breaking Language Barriers: How NVIDIA's AI is Transforming Sign Language Education

Microsoft's Majorana 1: When Theory Meets Engineering in Quantum Computing

Beyond Human Limitations: Google's AI Co-Scientist Promises to Accelerate Scientific Innovation

Breakthrough AI Brain Decoder Requires Minimal Training to Read Thoughts