登录查看更多内容

Evolving AGI benchmarks

Rajeswaran V (PhD)

Generative AI specialist. AI Futures and AI CoE head

发布日期: 2023年12月29日

We all know the Turing test was considered a solid benchmark for measuring AI capability and this has been well surpassed. Now, there is a race to create AGI and at the same time measure if the "system or model" has reached AGI.

This is important to consider for all of humanity. The saga of OpenAI can be also traced to this important question. Microsoft agreement with OpenAI excludes the models which have reached AGI and so it was important for Microsoft to have Sam Altman in OpenAI so that he can claim that the models they have created have not reached it. OpenAI's six-person board of directors will determine when the company has “attained AGI” — a threshold that will exclude Microsoft (on theory).

Now there are 2 new benchmarks which I find fascinating - BASIS and GPQA.

BASIS

The idea behind this benchmark is created by Mensa researcher and metaphysician Dr Jason Betts to design a suite of test items prioritizing imminent artificial superintelligence (ASI), and also including the lower ceiling of advanced AGI .

The BASIS project ensures that superintelligence can be appropriately assessed against very high human biological intelligence ceilings, and removes workarounds like holding out a subset of catalogued data (Common Crawl, books, journals, Wikipedia) from models. Instead, the testing mechanism is replaced with new, unique, and offline questions.

领英推荐

TAI #109: Cost and Capability Leaders Switching Places…

Towards AI 8 个月前

Is the o3 model AGI?

Jarno Duursma 2 个月前

Artificial Intelligence #183

Andriy Burkov 1 年前

Every question is designed to have an independently verifiable answer by at least one other human. Importantly, the question, answer, and even combination of keywords does not appear on the web and should never have been seen before.

GPQA

This is a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. They ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are "Google-proof").

要查看或添加评论，请登录

Rajeswaran V (PhD)的更多文章

Scaling laws

2024年2月4日

Scaling laws

A scaling law in deep learning typically takes the form of a power-law relationship, where one variable (e.g.

1 条评论
Copy of GenAI/LLM and productivity

2024年1月21日

Copy of GenAI/LLM and productivity

I will present 3 papers which discuss this from economics point of view. The productivity J-Curve "THE PRODUCTIVITY…
Paper clip maximization

2024年1月14日

Paper clip maximization

There is an very interesting thought experiment called "Paper clip maximization" This is a thought experiment by…
AI and research

2024年1月9日

AI and research

Microsoft performed a lot of experiments with GPT-4 and released the results in the paper titled "The Impact of Large…
Moravec's paradox and CV

2024年1月5日

Moravec's paradox and CV

I want to discuss face recognition and how it fits in with Moravec's paradox. Background Steven Pinker writes "The main…
AI robustness

2024年1月3日

AI robustness

When we build AI systems - care should be taken to test its robustness. A decentralized group of safe streets activists…
AI for Software Engineering

2024年1月2日

AI for Software Engineering

For corporates, Software Engineering lifecycle is most important. This is most relevant for IT majors on where and how…
AI in 2024 - some predictions

2024年1月1日

AI in 2024 - some predictions

There is an old saying "Prediction is very difficult. Especially about the future !".
Dangers of over-simplification

2023年12月31日

Dangers of over-simplification

In 2021 Sam Altman wrote an essay "Moore's Law for Everything". It gives some insight into his thinking on how AI will…
LLMs and Theory of mind

2023年12月31日

LLMs and Theory of mind

In March when researchers in Stanford published the paper "Theory of Mind Might Have Spontaneously Emerged in Large…

See all articles

Evolving AGI benchmarks

Rajeswaran V (PhD)

Generative AI specialist. AI Futures and AI CoE head

BASIS

领英推荐

GPQA

Rajeswaran V (PhD)的更多文章

社区洞察

其他会员也浏览了

Artificial Intelligence #183

Voxel51's Filtered Views Newsletter — April 26, 2024

LLM 2.0, RAG & Non-Standard Gen AI on GitHub

New Book: Building Disruptive AI & LLM Technology from Scratch

AI/ML news summary: week 32

AI/ML news summary: week 33

AI/ML news summary: Week 44

AI’s unintended consequences

Titans: A New Era for AI Memory Systems

From DeepSeek to DeepFreak

BASIS

领英推荐

GPQA

Rajeswaran V (PhD)的更多文章

Scaling laws

Copy of GenAI/LLM and productivity

Paper clip maximization

AI and research

Moravec's paradox and CV

AI robustness

AI for Software Engineering

AI in 2024 - some predictions

Dangers of over-simplification

LLMs and Theory of mind

社区洞察

其他会员也浏览了

Artificial Intelligence #183

Voxel51's Filtered Views Newsletter — April 26, 2024

LLM 2.0, RAG & Non-Standard Gen AI on GitHub

New Book: Building Disruptive AI & LLM Technology from Scratch

AI/ML news summary: week 32

AI/ML news summary: week 33

AI/ML news summary: Week 44

AI’s unintended consequences

Titans: A New Era for AI Memory Systems

From DeepSeek to DeepFreak