AGI - here or not?
AI progress since 2012 has been relentless. I am still kinda shocked every time I use ChatGPT, Grok, Claude, Perplexity and other AI services.
You can chat to them regarding pretty much anything, as you all know, and in many ways these "AI things somewhere in the clouds" seem like the smartest entities that I have ever communicated with, including humans.
If I saw a demo of this in 2012 I would think: "Wow, so they cracked AGI in 2024"
Here we are though, in 2025, and few AI experts believe that we have truly cracked anything like AGI, stating that we "just" have smart machine parrots that can mix and match existing human knowledge. Yes, pretty much all existing human knowledge, but still, parrots. So yeah, cool, but AGI? Nah.
The goalposts are not stationary....
Now for some suit speak, sorry.
You can't manage what you can't measure.
You can't measure what you can't define.
Benchmarks are our attempts at measuring intelligence, with the definition of said intelligence being tied to the definition and scope of the relevant benchmark.
The machines are doing rather well on many, many benchmarks, often surpassing human avg. performance, as depicted in Stanford's AI Report 2024.
So it seems the machines are at least still struggling to match human performance on quite a a few things, such as competition-level math and visual commonsense reasoning. Which are some of these benchmarks?
One is ARC-AGI, defined by Francois Chollet in his 2019 paper "On the Measure of Intelligence", in which he attempts to generally define intelligence as follows:
The intelligence of a system is a measure of its skill-acquisition efficiency over a scope of tasks, with respect to priors, experience, and generalization difficulty.
He created an accompanying benchmark to try and measure whether AI entities are just super parrots, or whether they can reason and learn. It is a series of visual puzzles, reminiscent of IQ tests. More info here: https://arcprize.org/arc
The AIs failed at his benchmark, miserably, until the end of 2024 that is. Progress was slow, and then very fast.
Another AI benchmark bites the dust, with human avg. performance surpassed by a machine intelligence. No worries, Chollet has come up with up with a version 2.0. Let's see how long that one lasts.
In all the years that I have been reading Chollet and listening to him on podcasts, he has been rather dismissive of AGI progress, similar to Yan LeCun. Chollet is now at least a little impressed:
领英推荐
I find Chollet's comment on AGI being achieved when we cannot set up challenging AI benchmarks anymore to be very interesting.
With that in mind, let's look at a recent math benchmark, one that is intended to be very difficult for the AIs, and humans, to solve: the FrontierMath challenge by Epoch AI.
It was published on November 7, 2024. Pretty much the most recent, absolute best that we as humans can do in setting up a hard math challenge.
More info at https://epoch.ai/frontiermath.
So, humans really applied themselves on this one. Not just any humans, but some of the world's experts across the full spectrum of modern mathematics, from algebraic geometry to Zermelo–Fraenkel set theory. Zermelo what? Yeah, it's hard.
In fact, it is doubtful whether there is any one human alive that could solve this benchmark within a day. As a team yes, alone, probably no.
Well, I am not so sure on Terence Tao's view that this benchmark will take several years, three or more, for the machines to solve. Why?
Because a month after the benchmark was published OpenAI's model o3 scored 25% on the FrontierMath benchmark. This is probably already close to the best human performance...
But 25% is far away from 100%, right? Right?
Well, let's look at ARC-AGI again. When models scored 25% early in 2024, 100% seemed like still quite a way to go too, yes? Nope, OpenAI went from 25% to 87% in one year.
So it is within the realm of possibility that the FrontierMath benchmark is solved by machines in 2025. We would then be very close to Chollet's informal AGI criteria of "no longer being able to set up a benchmark that the machine's can't beat us at", at least for the field of math.
Would we then have math AGI? How far away could physics AGI then be? 2026? Also 2025?
Remember William Gibson's quote?
The future is already here – it's just not evenly distributed.
I don't know about you, but I am increasingly starting to think:
Enjoy the ride......