Summary of what we have learned during AMA hour with the OpenAI o1 team on 2024-09-13
Model Names and Reasoning Paradigm
- OpenAI o1 is named to represent a new level of AI capability; the counter is reset to 1
- "Preview" indicates it's an early version of the full model
- "Mini" means it's a smaller version of the o1 model, optimized for speed
- o - as OpenAI
- o1 is not a "system"; it's a model trained to generate long chains of thought before returning a final answer
- The icon of o1 is metaphorically an alien of extraordinary ability
Size and Performance of o1 Models
- o1-mini is much smaller and faster than o1-preview, hence offered to free users in future
- o1-preview is an early checkpoint of the o1 model, neither bigger nor smaller
- o1-mini performs better in STEM tasks, but has limited world knowledge
- o1-mini excels at some tasks, especially in code-related tasks, compared to o1-preview
- Input tokens for o1 are calculated the same way as GPT-4o, using the same tokenizer
- o1-mini can explore more thought chains compared to o1-preview
Input Token Context and Model Capabilities
- Larger input contexts are coming soon for o1 models
- o1 models can handle longer, more open-ended tasks with less need for chunking input compared to GPT-4o
- o1 can generate long chains of thought before providing an answer, unlike previous models
- There is no current way to pause inference during CoT to add more context, but this is being explored for future models
Tools, Functionality, and Upcoming Features
- o1-preview doesn't use tools yet, but support for function calling, code interpreter, and browsing is planned
- Tool support, structured outputs, and system prompts will be added in future updates
- Users might eventually get control over thinking time and token limits in future versions
- Plans are underway to enable streaming and considering reasoning progress in the API
- Multimodal capabilities are built into o1, aiming for state-of-the-art performance in tasks like MMMU
CoT (Chain of Thought) Reasoning
- o1 generates hidden chains of thought during reasoning
- No plans to reveal CoT tokens to API users or ChatGPT
- CoT tokens are summarized, but there is no guarantee of faithfulness to the actual reasoning
- Instructions in prompts can influence how the model thinks about a problem
- Reinforcement learning (RL) is used to improve CoT in o1, and GPT-4o cannot match its CoT performance through prompting alone
- Thinking stage appears slower because it summarizes the thought process, even though answer generation is typically faster
领英推荐
API and Usage Limits
- o1-mini has a weekly rate limit of 50 prompts for ChatGPT Plus users
- All prompts count the same in ChatGPT
- More tiers of API access and higher rate limits will be rolled out over time
- Prompt caching in the API is a popular request, but no timeline is available yet
Pricing, Fine-tuning, and Scaling
- Pricing of o1 models is expected to follow the trend of price reductions every 1-2 years
- Batch API pricing will be supported once rate limits increase
- Fine-tuning is on the roadmap, but no timeline is available yet
- Scaling up o1 is bottlenecked by research and engineering talent
- New scaling paradigms for inference compute could bring significant gains in future generations of models
- Inverse scaling isn't significant yet, but personal writing prompts show o1-preview performing only slightly better than GPT-4o (or even slightly worse)
Model Development and Research Insights
- o1 was trained using reinforcement learning to achieve reasoning performance
- The model demonstrates creative thinking and strong performance in lateral tasks like poetry
- o1's philosophical reasoning and ability to generalize, such as deciphering ciphers, are impressive
- o1 was used by researchers to create a GitHub bot that pings the right CODEOWNERS for review
- In internal tests, o1 quizzed itself on difficult problems to gauge its capabilities
- Broad world domain knowledge is being added and will improve with future versions
- Fresher data for o1-mini is planned for future iterations of the model (Oct 2023 currently)
Prompting Techniques and Best Practices
- o1 benefits from prompting styles that provide edge cases or reasoning styles
- o1 models are more receptive to reasoning cues in prompts compared to earlier models
- Providing relevant context in retrieval-augmented generation (RAG) improves performance; irrelevant chunks may worsen reasoning
General Feedback and Future Enhancements
- Rate limits are low for o1-preview due to early-stage testing but will be increased
- Improvements in latency and inference times are actively being worked on
Remarkable Model Capabilities
- o1 can think through philosophical questions like "What is life?"
- Researchers found o1 impressive in its ability to handle complex tasks and generalize from limited instruction
- o1's creative reasoning abilities, such as quizzing itself to gauge its capabilities, showcase its high-level problem-solving
Computer Scientist, Data Scientist, Artificial Intelligence Specialist, and Bachelor of Laws Auditor at the Federal Court of Accounts - Brazil (TCU) and External Auditor for UN/UNICEF
6 个月Any comments about the future of GPTo series? Next model GP5o?