登录查看更多内容

The Final Awe of 2024, and Grand Design of Tasks that Inspire True Intelligence

Atlas Wang

XTX Markets & University of Texas at Austin

发布日期: 2024年12月22日

The buzz around OpenAI’s new “o3” model has been electrifying—and for good reason. It has crushed math competitions that stump even top-tier students, written code that leaves software engineers in awe, and even tackled advanced logic puzzles with startling ease. People are calling it the “next big step” in AI, building on the breakthroughs of AlphaGo and AlphaStar, but across multiple challenging domains rather than a single game or task. If AlphaGo was a super intelligence for Go, o3 is shaping up to be a super intelligence for everything from coding to higher math competitions like AIME and FrontierMath — an incredible feat.

But here’s the catch: our shiny new genius still stumbles on things that come instinctively to humans — especially the kind of “child’s play” puzzles you’d give a 5-year-old. This gap, known as Moravec’s paradox, reminds us that intelligence is a broad tapestry of skills, and excelling in one area doesn’t guarantee competence in another. It’s like watching an Olympic gymnast fail to ride a tricycle. Impressive feats in some areas don’t always translate to success across the board.

So how do we harness o3’s brilliance and push AI even further toward a versatile, human-like intelligence? The secret lies in expanding the scope of AI challenges, rethinking reward structures, and embracing creativity and collaboration in multi-agent settings.

Learning to Learn—Not Just to Win

When we train AI on a single domain with a simple reward—say, winning a board game—our model becomes the Usain Bolt of that particular track, but flops when it’s time to swim or ride a bike. Real life is more fluid and varied. Scientists often don’t know if a new theory will be successful until years of data roll in, and software engineers sometimes chase down bugs for weeks without any clear “score.”

To get there, we need tasks and environments that reflect the messy nature of real-life challenges. Instead of a single number (win/lose, 1/0), AI might juggle multiple objectives: correctness, time efficiency, creativity, safety, or even collaboration. Yes, that’s harder to measure—but it’s also how we juggle tasks every day. Some days we’re optimizing for speed (finishing that report before lunch), other days we focus on quality (perfecting the slides for an important presentation). True intelligence requires handling trade-offs, not just chasing a single scoreboard.

But ... for more advanced tasks (like creative software engineering or solving novel math problems), it’s not trivial to even specify a reward criterion! Potential recipes are:

Learned Reward Functions: Let the system learn to infer or negotiate the reward from the environment or from a human partner’s feedback. Think of it as “RL from human preferences,” extended to many tasks, or something akin to iterated reward shaping.
Intrinsic Motivation (embodied AI has studied for years): Borrow ideas from developmental psychology, where exploration and curiosity are themselves “rewarded.” This may help the system discover solutions we didn’t explicitly direct it to find.

From Brainy to Grounded

OpenAI’s o3 can solve math puzzles that make seasoned mathematicians scratch their heads—yet it can also fail at a puzzle a kindergartener might solve in minutes. The question is, how can we design tasks that capture these “simple” real-world skills in a way that truly challenges AI?

One approach is embodied or simulated environments where AI has to interact with a physical (or at least simulated) space. Think of it as the difference between solving Sudoku on paper and physically navigating a Lego maze. By integrating tasks that demand sensory awareness, intuitive physics, or social dynamics (even if simulated, tasks that demand understanding “common sense” social scripts, e.g., taking turns, negotiating resources, respecting constraints of others), we compel AI to learn many aspects of cognition that we typically take for granted.

Transfer Learning: AI’s Next “Aha!” Moment

One of the biggest goals for next-gen AI is transfer learning, where a system picks up knowledge in one domain and applies it effectively to another. It’s a bit like that moment you realize your skill at playing guitar helps you learn piano faster (both require a sense of rhythm and hand coordination). If o3 masters advanced calculus, can it transfer that structured thinking to, say, analyzing a legal contract or writing better code? If it can, that’s a genuine leap toward more general intelligence.

To test and encourage these abilities, we can design multi-task challenges that involve frequent context-switching. Instead of letting the AI train on just one type of problem until it’s perfect, we throw it puzzles of different kinds, gradually ramping up difficulty and variety. This way, it’s not memorizing solutions; it’s learning to learn.

领英推荐

TAI 131: OpenAI’s o3 Passes Human Experts; LLMs…

Towards AI 2 个月前

??Top ML Papers of the Week

DAIR.AI 11 个月前

July '23 DVC Community Updates

iterative.ai 1 年前

Collaboration is Key

Ever notice how some of the most creative breakthroughs happen when people work together? One person might be great at design while another excels in analytics. The synergy often sparks something bigger than the sum of the parts. In the AI world, we can replicate this through multi-agent reinforcement learning, where multiple AI agents—each with different roles or perspectives—must cooperate, negotiate, or sometimes compete.

This opens the door to fascinating social dynamics. Agents might have to figure out how to share resources, teach each other new skills, or even lie (let’s keep it ethical, though!). The result is an environment that tests social intelligence and strategic thinking—skills that are essential for a robust, human-like AI.

The Road Ahead

OpenAI’s o3 model has wowed mathematicians and coders alike, but it also reminds us how important it is to go beyond single-domain excellence. We want AI that can tackle real-world complexity, adapt to fresh challenges, and even do a backflip without toppling over (looking at you, Boston Dynamics!). That means building tasks and benchmarks that reflect the breadth of intelligence—messy, nuanced, cooperative, and sometimes plain weird.

By broadening tasks, revamping reward structures, and encouraging open-ended learning, we can create AI that’s not only good at winning chess matches or coding marathons, but also adept at everyday problem-solving. It’s a bold vision, but if o3 has taught us anything, it’s that these leaps forward happen when we’re ready to push the boundaries of what AI can do.

So here’s to the next frontier of AI tasks—where “child’s play” becomes a genuine test of mettle, multi-agent cooperation spawns new forms of creativity, and transfer learning starts making our AI partners more, well, human. If the early triumphs of o3 are any indication, we’re in for quite a ride—tricycle or otherwise.

References

This post is heavily inspired, after reading three incredibly insightful pieces:

https://www.interconnects.ai/p/openais-o3-the-2024-finale-of-ai

https://www.dhirubhai.net/posts/drjimfan_thoughts-about-o3-ill-skip-the-obvious-activity-7276309072478400512-0Tqw?utm_source=share&utm_medium=member_desktop

https://www.dhirubhai.net/posts/milescranmer_openais-o3-just-scored-875-on-the-arc-agi-activity-7276146225974804480-hka4?utm_source=share&utm_medium=member_desktop

... and I doubt I could have done a better job than any of the three! :) So you may still read their original ideas, too.

Yuguang Yao

Senior Research Scientist @Intuit. Email: [email protected]

2 个月

interaction between agents is really like the human beings’ success in the evolution.

1 次回应

Aayush Sugandh

Mtech AI IIT Kharagpur | Synopsys R&D | Jadavpur University Bachelor’s in Electrical Engineering

2 个月

Excellent post, professor.

1 次回应

查看更多评论

要查看或添加评论，请登录

Atlas Wang的更多文章

2025 makes me feel I have to learn more RL...

2025年1月21日

2025 makes me feel I have to learn more RL...

Just a year ago, “The Year of the Agent” swept the AI community. Now, as we move into 2025, momentum around large-scale…

7 条评论
Why RLHF (and Other RL-Like Methods) Don’t Bring “True RL” to LLMs—and Why It Matters

2024年12月27日

Why RLHF (and Other RL-Like Methods) Don’t Bring “True RL” to LLMs—and Why It Matters

Edited Dec 31 2024: two relevant, highly insightful blog posts: https://www.interconnects.

16 条评论
Beyond the Scaling Mirage, Toward a Neuro-Symbolic Renaissance?

2024年11月12日

Beyond the Scaling Mirage, Toward a Neuro-Symbolic Renaissance?

I began writing this article tonight after chewing a recent Reuters piece (by Krystal Hu & Anna Tong) that potentially…

16 条评论
LLM Hallucination: an Optimization Problem or an Architecture Problem?

2024年9月6日

LLM Hallucination: an Optimization Problem or an Architecture Problem?

Preface: I’m currently in Indianapolis, attending an exciting DARPA workshop, and one of the hot topics of the day has…

3 条评论
My Weekend Awareness of "Situational Awareness"

2024年8月18日

My Weekend Awareness of "Situational Awareness"

??? Every rainy weekend in NYC when I have no better thing to do, I dedicate time to reading papers or books?? that…

2 条评论
From GaLore to WeLore: If Gradients are Low-Rank, What About the Weights?

2024年7月17日

From GaLore to WeLore: If Gradients are Low-Rank, What About the Weights?

In our previous blog post, we introduced GaLore & Q-GaLore, which shed light on an intriguing property: the low-rank…

2 条评论
?? Introducing "GaLore-v2" or Q-GaLore: A Latest Milestone in Low-Rank LLM Training ??

2024年7月12日

?? Introducing "GaLore-v2" or Q-GaLore: A Latest Milestone in Low-Rank LLM Training ??

We are incredibly thrilled to introduce Q-GaLore (or "GaLore v2"), a major upgrade and advancement following our…

5 条评论

See all articles

The Final Awe of 2024, and Grand Design of Tasks that Inspire True Intelligence

Atlas Wang

XTX Markets & University of Texas at Austin

Learning to Learn—Not Just to Win

From Brainy to Grounded

Transfer Learning: AI’s Next “Aha!” Moment

领英推荐

Collaboration is Key

The Road Ahead

References

Atlas Wang的更多文章

社区洞察

其他会员也浏览了

How should OpenAI price o1?

What’s Next for Artificial Intelligence in 2025

Software development future - AI and Machine learning

Why Jensen’s “Don’t Learn to Code” Statement Feels Overly Optimistic

Old Machines, New Tricks: Building TensorFlow v2.16 from Scratch

The Lang Project, Effective Visualization, LLM course, and More

Data Science #21

Artificial Intelligence #156

The Dawn of a New Era: OpenAI’s o3 Model Surpasses the Best of Us

Artificial Intelligence #171

Learning to Learn—Not Just to Win

From Brainy to Grounded

Transfer Learning: AI’s Next “Aha!” Moment

领英推荐

Collaboration is Key

The Road Ahead

References

Atlas Wang的更多文章

2025 makes me feel I have to learn more RL...

Why RLHF (and Other RL-Like Methods) Don’t Bring “True RL” to LLMs—and Why It Matters

Beyond the Scaling Mirage, Toward a Neuro-Symbolic Renaissance?

LLM Hallucination: an Optimization Problem or an Architecture Problem?

My Weekend Awareness of "Situational Awareness"

From GaLore to WeLore: If Gradients are Low-Rank, What About the Weights?

?? Introducing "GaLore-v2" or Q-GaLore: A Latest Milestone in Low-Rank LLM Training ??

社区洞察

其他会员也浏览了

How should OpenAI price o1?

What’s Next for Artificial Intelligence in 2025

Software development future - AI and Machine learning

Why Jensen’s “Don’t Learn to Code” Statement Feels Overly Optimistic

Old Machines, New Tricks: Building TensorFlow v2.16 from Scratch

The Lang Project, Effective Visualization, LLM course, and More

Data Science #21

Artificial Intelligence #156

The Dawn of a New Era: OpenAI’s o3 Model Surpasses the Best of Us

Artificial Intelligence #171