Transcendence

Transcendence

"The smartest person in the room is the room" GPT-style

In last week’s post, “The botterfly effect”, I used the phrase “the slave becomes the master”. This subtle nod to Metallica’s “The End of the Line” highlighted how tools meant to help us end up directing us instead.

I toyed with another line for last week’s post: “The student becomes the master.” But I dropped it. It didn’t fit because my message was not about training generative AI systems. Plus, as recently as last week, I believed that generative AI systems can only be as good—on average—as the average of the expertise in their training dataset. The student could become the master only if the master were average.

Or was I wrong about it?

“Median human” performance of Generative AI. What is it?

Sam Altman (love or him hate him, people listen to him) used the term “median human” to describe the expected quality of OpenAI’s systems: better than half the population, worse than the other half. Perfectly average.

This makes sense. Generative AI models are trained on human-created content and learn the conditional probability of such data. According to their internal representation of the data that trained them, anything that they create is the most probable outcome (thus the “median” behaviour). Give them enough data (how about the entire Internet?) and they will learn to create perfectly average Internet content.

If these very same models encounter a situation they’ve not seen in their training set, they will generate random (less probable) outputs that look “okay” to a non-expert but are utterly crazy for experts.

Not sure what I mean? Look at this chess game, between GPT-3 and a proper chess algorithm (not a large language model), Stockfish. This year-old example starts as a perfectly average game until—several moves into the game—ChatGPT encounters something it has never seen before and starts behaving very randomly (pay attention to one of the black knights).

My point? Generative AI systems are not only median (a synonym here is “mediocre”) but also start behaving unpredictably precisely when we have built enough trust in them (by watching their past performance).

The game of chess demonstrates it well. To a non-player, it looks like a perfectly decent game. To a beginner, the first several moves might even look smart. To an expert, the system starts as average, only to descend into madness.

This is also a perfect example showing that, for specific tasks, there are perfectly capable algorithms that outperform generative AI (Stockfish was released 15 years ago). Before using generative AI to solve a problem, ask if there’s a better way.

So, will generative AI models remain “average”? Or will they ever be able to outperform the human data that trained them?

The smartest person in the model is the model

I came across a paper titled “Transcendence: Generative Models Can Outperform The Experts That Train Them” by Edwin Zhang and colleagues from Harvard, UC Santa Barbara, and Princeton. It was released earlier this week. The study explores how generative models, when trained under specific conditions, can achieve capabilities surpassing the expertise they are trained on.

The authors built a system called ChessFormer (a “chess transformer”—the same transformer technology that hides behind the last “T” in “ChatGPT”), trained on human chess game transcripts. When evaluated under specific conditions such as low-temperature sampling (a trick to make model outcomes less random—typically bad for creativity but good for precision-demanding tasks), the model outperformed the highest-rated human players in the training dataset. In other words, the model became smarter than its teachers.

Source: Midjourney


This phenomenon, which authors termed “transcendence,” suggests that generative AI systems can surpass individual expert performance by leveraging collective human expertise and minimising errors through the AI equivalent of majority voting.

This figure from the paper demonstrates how decreasing the sampling temperature made the model more confident about the best move.

This research underscores the potential for generative AI to not just mimic but exceed human expertise in specific domains. For now, it’s chess, but I am certain studies of other domains will follow. I acknowledge that this is just one paper so far, and it’s not peer-reviewed yet. Still, perhaps we should start questioning our perception of the role of generative AI in our creative and decision-making processes.

Let me spell it out: we’re starting to see early evidence of the ability of generative AI models to exceed the expertise from their training data. This is new in generative AI. And it’s eerily similar to what David Weinberger wrote twelve years ago about humans: “The smartest person in the room is the room”. Collective intelligence exceeds the intelligence of any individual.

In 2024, “The Smartest Person in the Model is the Model”.

The Smartest Person in the Model is the Model

But wait, there’s a catch.

The research uncovers an interesting insight into human expertise. Only models trained on diverse datasets, encompassing a wide range of player ratings and styles, performed significantly better than human experts. Such diversity allows the model to generalise and improve upon the individual performance of its trainers by minimising biases and errors.

Without enough human diversity, the model’s ability to outperform its training data dramatically diminishes.

What does it mean for business?

  1. Do not expect generic models, such as ChatGPT, to outperform experts any time soon. These systems will continue to provide “median human” quality. My chess video will continue to be relevant. But that’s ok—not every model needs to transcend. Sometimes, all you need is the equivalent of the first few moves in chess, even if they’re not world-class.
  2. To replicate your current organisational expertise, you might not need Generative AI. Stockfish (an old-school algorithm that no one would call AI these days) reminded us about it. It will take a long time for generative AI systems to beat it. There’s a category of AI called expert systems. Such systems focus on replicating human expertise directly. An expert system might be all that you need.
  3. Train your own models if you need them to reach higher expertise levels than your human team members. These models will not be generic, though. A model trained on legal documents cannot write poems—leave that task to ChatGPT.
  4. Transcendence seems possible only if you train the models on diverse datasets. If your human team is not diverse enough, don’t expect the generative AI systems to perform well.

Wait, what if we trained future models on the outputs of these new, transcendent models?

I might need to give Ray Kurzweil a call.


See me live:

  • Tuesday, 23 July 2024, Brisbane, Australia: Closing keynote of Curriculum Connect Symposium for Teachers and Teacher Librarians?Misinformation, Fake News & AI: Building critical thinking in a post-truth classroom. Register here.
  • Wednesday, 28 August 2024, Brisbane, Australia: Closing keynote of Something Digital Festival. Register here.
  • Monday, 30 September 2024, Brisbane, Australia: Opening keynote of IFLA Information Futures Summit. Register here.
  • Wednesday, 23 October 2024, Warsaw, Poland: Masters&Robots 2024. Register here.
  • Friday, 25 October 2024, Dallas, Texas: Closing keynote of Tech Summit: AI + SAP BTP. Register here.

Recent podcasts I spoke at:


Prof. Marek Kowalkiewicz is a Professor and Chair in Digital Economy at QUT Business School. Listed among the Top 100 Global Thought Leaders in Artificial Intelligence by Thinkers360, Marek has led global innovation teams in Silicon Valley, was a Research Manager of SAP's Machine Learning lab in Singapore, a Global Research Program Lead at SAP Research, as well as a Research Fellow at Microsoft Research Asia. His newest book is called "The Economy of Algorithms: AI and the Rise of the Digital Minions".


Rohan Arnold

Cybersecurity Business Analyst | Digital Transformation Business Analyst

8 个月

Some great points in your post Prof. Marek Kowalkiewicz and aligned with my thinking about AI producing the median of responses. I think humans are interesting because of the edge/outliers. If the model can exceed expert performance due to the collective expertise it has been trained on, how will individuals validate the output if the reasoning is beyond their capabilities?

回复
Sue Isbell

Head of Department, eLearning at Kelvin Grove State College

9 个月

And for politics/government?

回复
Brian Lee-Archer

GAICD - Advisory in digital transformation, government services and social security

9 个月

Hi Marek - love your work. The medium human concept is something we should all be aware of. Even experts have been known to make things up or perhaps say something is true when they know it may not be backed up by facts, when backed into a corner in terms of expressing their knowledge. As Gen AI hallucinates in these situations it will be a challenge for humans to detect (same as when the expert says something and we accept it as true even when we have our doubts). Will there be transparency built into Gen AI offerings with something like this coming up first in response to the prompt - "I think I am hallucinating on this one - enjoy the ride"

Sarah Daly

AI Management | Artist | PhD Candidate

9 个月

Very interesting, Marek. There seem to be parallels between increasing human creativity and increasing machine creativity/ capability. For example, this reminds me of the creativity research on group brainstorming vs nominal pairs, where the pairs came up with better ideas because they weren't primed by the room. An interesting space for experimentation.

Henrik Vogt

ai project management renewable energy

9 个月

Q1 yes Q2 a lot and not just business but in any discipline. Decisions become transparent because we can question derivations like never before.

要查看或添加评论,请登录

Prof. Marek Kowalkiewicz的更多文章

  • Ctrl+C, Ctrl+Me

    Ctrl+C, Ctrl+Me

    Your Skills Are Being Cloned Remember ABBA? They're back, performing nightly in London. Well, sort of.

    3 条评论
  • The Snake in the AI Garden

    The Snake in the AI Garden

    Hi, I'm Marek Kowalkiewicz, an AI and technology governance expert, advising governments and corporations. I've led…

    8 条评论
  • 280 McNuggets, please

    280 McNuggets, please

    Why you need to know the difference between generative AI and analytical AI Imagine pulling up to McDonald’s…

    10 条评论
  • The AIcarus Trap

    The AIcarus Trap

    How Your Shiny New AI' Wins' Can Send Your Bottom Line Crashing Back to Earth Almost exactly 10 years ago, on December…

    1 条评论
  • AI in HAIL MARY

    AI in HAIL MARY

    Forget "AI Best Practices". Try Desperation Instead.

    4 条评论
  • Where's the "hallucinate more" button?

    Where's the "hallucinate more" button?

    Ever notice how the tech world obsesses over AI hallucinations? “The web is full of warnings about ChatGPT making…

    7 条评论
  • Ages-old agents

    Ages-old agents

    Algorithmic agents have been around for longer than you might think There is a lot of excitement around “AI agents”…

    11 条评论
  • SearchGPT and the future of SEO

    SearchGPT and the future of SEO

    And why Pig Latin might be the next language you want to learn A couple of weeks ago, OpenAI launched SearchGPT, a…

    7 条评论
  • Phantom AI

    Phantom AI

    When Your Best Work Seems Artificially Good Recently, someone I trust a lot confided in me. “I can’t stand working with…

    12 条评论
  • Run an AI day!

    Run an AI day!

    You have no excuses anymore The Bygone Era of Expensive Innovation Events Remember when rapid prototyping was a luxury?…

    7 条评论

社区洞察

其他会员也浏览了