Humanity’s Last Exam —a new benchmark— shows how far AI still lags behind human expertise. ?? A new benchmark called “Humanity’s Last Exam” has been introduced to evaluate large language models (LLMs) using 3,000 challenging questions across various subjects, including mathematics. Developed by nearly 1,000 subject-matter experts from over 500 institutions worldwide, this benchmark aims to assess LLMs at the frontier of human knowledge. Notably, current state-of-the-art LLMs demonstrate low accuracy on this benchmark, highlighting a significant gap between their capabilities and expert human performance. Paper: https://lnkd.in/edN3uW9A #AI #LLM #Benchmark #ArtificialIntelligence #HumanExpertise
Oz ??alan what about deepseek?
Creative Director & Generative AI Consultant for Creatives
3 周Dan Hendrycks shared this: