Oz ??alan的动态

查看Oz ??alan的档案

Creative Director & Generative AI Consultant for Creatives

Humanity’s Last Exam —a new benchmark— shows how far AI still lags behind human expertise. ?? A new benchmark called “Humanity’s Last Exam” has been introduced to evaluate large language models (LLMs) using 3,000 challenging questions across various subjects, including mathematics. Developed by nearly 1,000 subject-matter experts from over 500 institutions worldwide, this benchmark aims to assess LLMs at the frontier of human knowledge. Notably, current state-of-the-art LLMs demonstrate low accuracy on this benchmark, highlighting a significant gap between their capabilities and expert human performance. Paper: https://lnkd.in/edN3uW9A #AI #LLM #Benchmark #ArtificialIntelligence #HumanExpertise

  • chart, bar chart
Oz ??alan

Creative Director & Generative AI Consultant for Creatives

3 周

Dan Hendrycks shared this:

  • 该图片无替代文字
回复
Dr. Jalil A.

?Pharmacist Doctor?? ??Healthcare AI & Tech?? ?? Project Management?? ?? Data Analytics ?? Talk about #Healthcare Innovations #AI in Healthcare #Wearable Health Tech #Blockchain in Healthcare #Robotics in Healthcare

1 个月

Oz ??alan what about deepseek?

回复
查看更多评论

要查看或添加评论,请登录