Open-AI's GPT-4o [Audio,Vision & Text] Capabilities

Open-AI's GPT-4o [Audio,Vision & Text] Capabilities

Hello GPT-4o

GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction -

  1. Accepts Text,Audio,Images & Video and Generates any combination of text, audio & image outputs.
  2. Respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation.
  3. Matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages.
  4. GPT4o is 2X Faster & 50% cheaper.
  5. GPT-4o is especially better at vision and audio understanding compared to existing models.

Introducing GPT-4o - Model capabilities

Model evaluations

GPT-4o achieves GPT-4 Turbo-level performance on text, reasoning & coding intelligence with supporting multilingual, audio, and vision capabilities.

Model Safety & Limitations

GPT-4o has safety built-in by design across Modalities -

  1. Applying techniques on filtering training data.
  2. Refining Model’s behavior through post-training.
  3. Applying guardrails on voice outputs.
  4. GPT-4o according to our Preparedness Framework and in line with our voluntary commitments.
  5. GPT-4o has also undergone Extensive external Red-Teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. These learnings are used to build out our safety interventions in order to improve the safety of interacting with GPT-4o.

Model availability -

?GPT-4o’s text and image capabilities are available in the free tier & Plus users with up to 5x higher message limits. New version of Voice Mode with GPT-4o in alpha within ChatGPT Plus is coming soon.

AI Developers can also now access GPT-4o in the API as a Text & Vision model.

References -

Open AI Blog -

https://openai.com/index/hello-gpt-4o/

https://openai.com/preparedness/

https://openai.com/index/moving-ai-governance-forward/

https://www.pnas.org/doi/10.1073/pnas.0903616106

Introducing GPT-4o - Model capabilities

https://www.youtube.com/watch?v=DQacCB9tDaw

For more information on AI Research Papers you can visit my Github Profile -

https://github.com/aditikhare007/AI_Research_Junction_Aditi_Khare

For Receving latest updates on Advancements in AI Research Gen-AI, Quantum AI & Computer Vision you can subscribe to my AI Research Papers Summaries Newsletter using below link -

https://www.dhirubhai.net/newsletters/ai-research-junction-7152631955203739649/

Thank you & Happy Reading !


要查看或添加评论,请登录

Aditi Khare的更多文章

社区洞察

其他会员也浏览了