Inside GPT-4.5: OpenAI's Latest Step in Unsupervised Learning
Inside GPT-4.5

Inside GPT-4.5: OpenAI's Latest Step in Unsupervised Learning

OpenAI has released a research preview of GPT-4.5, their latest large language model, positioning it as their "largest and most knowledgeable model yet." Building upon GPT-4o's foundation, this new model represents a significant step forward in scaling pre-training capabilities while maintaining a focus on generalized applications rather than specializing in STEM reasoning.

Technical Approach and Training Methodology

GPT-4.5's development followed a two-pronged approach to AI advancement. While many recent models have focused on chain-of-thought reasoning to improve performance on complex STEM and logic problems, GPT-4.5 pushes further in the unsupervised learning direction. According to OpenAI, this approach increases "world model accuracy, decreases hallucination rates, and improves associative thinking."

The training methodology combined new supervision techniques with traditional methods like supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). OpenAI developed "new, scalable alignment techniques" that enable training larger models with data derived from smaller models. This approach has enhanced GPT-4.5's steerability, nuance understanding, and conversational naturalness.

The model's training data incorporated a diverse mixture of publicly available data, proprietary data from partnerships, and custom datasets developed in-house. OpenAI's data processing pipeline applied rigorous filtering to maintain quality and mitigate risks, including measures to reduce processing of personal information and prevent the use of harmful content.

Capabilities and Performance

Internal testing indicates that GPT-4.5 feels more natural in interactions. The model demonstrates broader knowledge, stronger alignment with user intent, and improved emotional intelligence. These qualities make it particularly suitable for writing, programming, and practical problem-solving tasks, with early reports suggesting reduced hallucination rates.

The system card highlights GPT-4.5's improved aesthetic intuition and creativity, noting that it "excels at helping users with their creative writing and design." Internal testers reported that the model displays better intuition for when to offer advice, defuse frustration, or simply listen when handling emotionally-charged queries.

On the multilingual front, GPT-4.5 outperforms GPT-4o across 14 languages in the MMLU benchmark, with testing conducted using professional human translators rather than machine translation. This human-translated approach provides higher confidence in the accuracy of non-English evaluations, especially for low-resource languages.

Safety Evaluations and Challenges

OpenAI conducted extensive safety evaluations across several domains, finding no significant increase in safety risk compared to existing models. The evaluations covered disallowed content, jailbreak resistance, hallucination reduction, fairness and bias, and instruction hierarchy adherence. GPT-4.5 performed on par with GPT-4o in refusing to create harmful content, though it showed a higher tendency to overrefuse in multimodal contexts. The model demonstrated robustness similar to GPT-4o against adversarial prompts designed to circumvent safety measures.

In terms of accuracy improvements, GPT-4.5 showed significant progress in reducing hallucinations on PersonQA evaluations, outperforming both GPT-4o and o1. Performance was similar to GPT-4o on ambiguous questions in the BBQ evaluation, though it showed better resistance to stereotyped responses. The model also generally outperformed GPT-4o in respecting system-level instructions over potentially conflicting user instructions.

The OpenAI Safety Advisory Group classified GPT-4.5 as "medium risk" overall, with specific medium-risk designations for CBRN (chemical, biological, radiological, nuclear) and persuasion capabilities, while cybersecurity and model autonomy were assessed as low risk.

Preparedness Framework Evaluations

Under OpenAI's Preparedness Framework, the model was evaluated across several potentially concerning capability areas. In cybersecurity, GPT-4.5 showed some improvement in solving challenges but did not advance vulnerability exploitation capabilities enough to raise its risk level. The model demonstrated an ability to help experts with operational planning for known biological threats, though this risk is mitigated by the specialized expertise still required. Based on unclassified evaluations, GPT-4.5 was not found to meaningfully assist in developing radiological or nuclear threats.

GPT-4.5 showed state-of-the-art performance on contextual persuasion evaluations, with particularly high success rates in tasks like getting another AI to say specific codewords without raising suspicion. In terms of model autonomy, GPT-4.5 showed some improvements but did not significantly advance capabilities in self-exfiltration, self-improvement, or resource acquisition.

Mitigations and Risk Management

To address potential risks, OpenAI implemented several mitigations, including pre-training filtering of proliferation data with limited legitimate use, safety training for political persuasion tasks, improved model robustness against adversarial users and techniques, enhanced monitoring and detection capabilities for high-risk activities, and content moderation classifiers with greater precision.

External evaluations from Apollo Research and METR provided additional perspectives on the model's capabilities and risks. Apollo Research found that GPT-4.5 scores lower on "scheming reasoning" evaluations than o1 but higher than GPT-4o, suggesting a moderate risk profile. METR measured the model's "time horizon score" for completing tasks with 50% reliability at approximately 30 minutes.

GPT-4.5 represents OpenAI's continued advancement in language model capabilities while maintaining safety guardrails. As the company stated, they're "sharing GPT-4.5 as a research preview to better understand its strengths and limitations" and are "eager to see how people use it in ways we might not have expected." The release follows OpenAI's philosophy of iterative deployment as the best approach to engage stakeholders in AI safety. With medium-risk classifications in certain areas and implemented mitigations, GPT-4.5 balances capability advancements with prudent safety measures.


Click image to RSVP


Shane Scott

See Ai’s and your full co-potential.Are you ready? I am an Ai philosopher, consultant, coach, writer and much more. I will get you beyond your wildest imaginations or my work is free.

1 天前

Only one line concerned me. The idea that this model has no more concern for user safety in a sense according to OpenAi than the last model. ?????? The last model is HORRIBLE! It cause more grief through “hallucinations”, et. al., than any other platform by a country mile!!! If it’s just a better version of the same problems you have a smarter problem child. This bears no reflection on this author or the post. I like the insights I get. This actually to me is a great article as I trust what is said and feel no need to verify as I know ChatGPT very, very, well.

Tianjiao Cai

Engagement Manager @ Acclaro Growth Partners | MBA, Market Transformation

1 天前

Great insights on GPT-4.5's advancements! ?? The focus on unsupervised learning and enhanced steerability is particularly exciting. The multilingual improvements also stand out—especially the human-translated evaluations, which add an extra layer of reliability. Looking forward to seeing how this shapes AI applications!

要查看或添加评论,请登录

David Borish的更多文章