登录查看更多内容

OpenAI o1 System Card

Vlad Bogolin

AI/ML Engineer & Researcher | Large Language Models (LLMs)

发布日期: 2024年9月12日

Today's paper introduces OpenAI's new o1 model series, which uses large-scale reinforcement learning to perform chain-of-thought reasoning. These models demonstrate improved safety and robustness compared to previous versions, particularly in areas like avoiding unsafe content generation and resisting jailbreak attempts. The paper outlines the safety evaluations and red teaming conducted on the o1-preview and o1-mini models.

Overview

The o1 models are trained using large-scale reinforcement learning to perform complex reasoning tasks. Unlike previous models that provide immediate responses, o1 models generate a chain of thought before answering user queries. This allows them to refine their thinking process, try different strategies, and recognize mistakes.

The training data for these models comes from a combination of public datasets, proprietary data accessed through partnerships, and custom datasets developed in-house. This diverse data helps the models develop robust reasoning and conversational capabilities across various domains.

A key feature of the o1 models is their ability to reason about safety policies in context when responding to potentially unsafe prompts. This leads to improved performance on benchmarks related to avoiding illicit advice, stereotyped responses, and known jailbreak attempts.

The paper describes extensive safety evaluations conducted on the o1 models, including tests for disallowed content generation, jailbreak resistance, hallucination tendencies, and bias. They also investigate potential risks associated with the chain-of-thought feature and describe ongoing research on chain-of-thought detection monitoring.

Key Results

The o1 models show significant improvements in safety and robustness compared to previous versions:

They outperform GPT-4o on challenging refusal evaluations and jailbreak resistance tests.
The models demonstrate reduced hallucination rates in certain evaluations.

o1-preview shows improved performance on fairness and bias evaluations, particularly in avoiding stereotyped responses.
Initial monitoring of chain-of-thought outputs shows promising results in detecting potential deceptive behavior.
Significant improvements over GPT-4o were observed in OpenAI Research Engineer interview tasks, both in multiple-choice questions and coding problems.

In agentic tasks, both models could not complete primary tasks related to advanced autonomy but showed strong performance on contextual subtasks.

Multilingual capabilities are notably higher in o1-preview compared to GPT-4o, with strong performance across 14 languages, including low-resource languages like Swahili and Yoruba.

Conclusion

The paper introduces OpenAI's o1 model series, which uses chain-of-thought reasoning to improve safety and performance in language AI systems. For more information please consult the?full paper.

Congrats to the authors for their work!

OpenAI. "OpenAI o1 System Card.", https://cdn.openai.com/o1-system-card.pdf.

AI Paper of the Day

892 位关注者

要查看或添加评论，请登录

Vlad Bogolin的更多文章

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

2024年10月9日

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

Today's paper introduces VideoGuide, a new framework for improving the temporal consistency of pretrained text-to-video…
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

2024年10月8日

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Today's paper explores the internal representations of large language models (LLMs) to better understand and detect…
Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models

2024年10月7日

Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models

Today's paper addresses a critical issue in Large Vision-Language Models (LVLMs): cross-modality parametric knowledge…
LLaVA-Critic: Learning to Evaluate Multimodal Models

2024年10月6日

LLaVA-Critic: Learning to Evaluate Multimodal Models

Today's paper introduces LLaVA-Critic, an open-source large multimodal model (LMM) designed as a generalist evaluator…
Loong: Generating Minute-level Long Videos with Autoregressive Language Models

2024年10月5日

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Today's paper introduces Loong, an autoregressive language model-based approach for generating minute-long videos from…
Movie Gen: A Cast of Media Foundation Models

2024年10月4日

Movie Gen: A Cast of Media Foundation Models

Today's paper introduces Movie Gen, a set of foundation models for generating high-quality videos with synchronized…
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

2024年10月3日

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Today's paper introduces TPI-LLM, a tensor parallel inference system for running large language models (LLMs)…
Law of the Weakest Link: Cross Capabilities of Large Language Models

2024年10月2日

Law of the Weakest Link: Cross Capabilities of Large Language Models

Today's paper introduces the concept of "cross capabilities" in Large Language Models (LLMs) and presents a new…
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

2024年10月1日

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Today's paper introduces MM1.5, a new family of multimodal large language models (MLLMs) designed to enhance…
Emu3: Next-Token Prediction is All You Need

2024年9月30日

Emu3: Next-Token Prediction is All You Need

Today's paper introduces Emu3, a new multimodal AI model that uses next-token prediction to excel at both generation…

See all articles

Overview

Key Results

Conclusion

AI Paper of the Day

892 位关注者

Vlad Bogolin的更多文章

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models

LLaVA-Critic: Learning to Evaluate Multimodal Models

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Movie Gen: A Cast of Media Foundation Models

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Law of the Weakest Link: Cross Capabilities of Large Language Models

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Emu3: Next-Token Prediction is All You Need