Summary of what we have learned during AMA hour with the OpenAI o1 team on 2024-09-13


Model Names and Reasoning Paradigm

- OpenAI o1 is named to represent a new level of AI capability; the counter is reset to 1

- "Preview" indicates it's an early version of the full model

- "Mini" means it's a smaller version of the o1 model, optimized for speed

- o - as OpenAI

- o1 is not a "system"; it's a model trained to generate long chains of thought before returning a final answer

- The icon of o1 is metaphorically an alien of extraordinary ability


Size and Performance of o1 Models

- o1-mini is much smaller and faster than o1-preview, hence offered to free users in future

- o1-preview is an early checkpoint of the o1 model, neither bigger nor smaller

- o1-mini performs better in STEM tasks, but has limited world knowledge

- o1-mini excels at some tasks, especially in code-related tasks, compared to o1-preview

- Input tokens for o1 are calculated the same way as GPT-4o, using the same tokenizer

- o1-mini can explore more thought chains compared to o1-preview


Input Token Context and Model Capabilities

- Larger input contexts are coming soon for o1 models

- o1 models can handle longer, more open-ended tasks with less need for chunking input compared to GPT-4o

- o1 can generate long chains of thought before providing an answer, unlike previous models

- There is no current way to pause inference during CoT to add more context, but this is being explored for future models


Tools, Functionality, and Upcoming Features

- o1-preview doesn't use tools yet, but support for function calling, code interpreter, and browsing is planned

- Tool support, structured outputs, and system prompts will be added in future updates

- Users might eventually get control over thinking time and token limits in future versions

- Plans are underway to enable streaming and considering reasoning progress in the API

- Multimodal capabilities are built into o1, aiming for state-of-the-art performance in tasks like MMMU


CoT (Chain of Thought) Reasoning

- o1 generates hidden chains of thought during reasoning

- No plans to reveal CoT tokens to API users or ChatGPT

- CoT tokens are summarized, but there is no guarantee of faithfulness to the actual reasoning

- Instructions in prompts can influence how the model thinks about a problem

- Reinforcement learning (RL) is used to improve CoT in o1, and GPT-4o cannot match its CoT performance through prompting alone

- Thinking stage appears slower because it summarizes the thought process, even though answer generation is typically faster


API and Usage Limits

- o1-mini has a weekly rate limit of 50 prompts for ChatGPT Plus users

- All prompts count the same in ChatGPT

- More tiers of API access and higher rate limits will be rolled out over time

- Prompt caching in the API is a popular request, but no timeline is available yet


Pricing, Fine-tuning, and Scaling

- Pricing of o1 models is expected to follow the trend of price reductions every 1-2 years

- Batch API pricing will be supported once rate limits increase

- Fine-tuning is on the roadmap, but no timeline is available yet

- Scaling up o1 is bottlenecked by research and engineering talent

- New scaling paradigms for inference compute could bring significant gains in future generations of models

- Inverse scaling isn't significant yet, but personal writing prompts show o1-preview performing only slightly better than GPT-4o (or even slightly worse)


Model Development and Research Insights

- o1 was trained using reinforcement learning to achieve reasoning performance

- The model demonstrates creative thinking and strong performance in lateral tasks like poetry

- o1's philosophical reasoning and ability to generalize, such as deciphering ciphers, are impressive

- o1 was used by researchers to create a GitHub bot that pings the right CODEOWNERS for review

- In internal tests, o1 quizzed itself on difficult problems to gauge its capabilities

- Broad world domain knowledge is being added and will improve with future versions

- Fresher data for o1-mini is planned for future iterations of the model (Oct 2023 currently)


Prompting Techniques and Best Practices

- o1 benefits from prompting styles that provide edge cases or reasoning styles

- o1 models are more receptive to reasoning cues in prompts compared to earlier models

- Providing relevant context in retrieval-augmented generation (RAG) improves performance; irrelevant chunks may worsen reasoning

General Feedback and Future Enhancements

- Rate limits are low for o1-preview due to early-stage testing but will be increased

- Improvements in latency and inference times are actively being worked on


Remarkable Model Capabilities

- o1 can think through philosophical questions like "What is life?"

- Researchers found o1 impressive in its ability to handle complex tasks and generalize from limited instruction

- o1's creative reasoning abilities, such as quizzing itself to gauge its capabilities, showcase its high-level problem-solving


Rafael Bittencourt

Computer Scientist, Data Scientist, Artificial Intelligence Specialist, and Bachelor of Laws Auditor at the Federal Court of Accounts - Brazil (TCU) and External Auditor for UN/UNICEF

6 个月

Any comments about the future of GPTo series? Next model GP5o?

要查看或添加评论,请登录

Tibor Blaho的更多文章

社区洞察

其他会员也浏览了