Transcendence
Prof. Marek Kowalkiewicz
Bestselling author of "The Economy of Algorithms: AI and the Rise of the Digital Minions" | Professor and Chair in Digital Economy | Top 100 AI Thought Leader | Global Keynote Speaker on Digital Economy, AI & Innovation
"The smartest person in the room is the room" GPT-style
In last week’s post, “The botterfly effect”, I used the phrase “the slave becomes the master”. This subtle nod to Metallica’s “The End of the Line” highlighted how tools meant to help us end up directing us instead.
I toyed with another line for last week’s post: “The student becomes the master.” But I dropped it. It didn’t fit because my message was not about training generative AI systems. Plus, as recently as last week, I believed that generative AI systems can only be as good—on average—as the average of the expertise in their training dataset. The student could become the master only if the master were average.
Or was I wrong about it?
“Median human” performance of Generative AI. What is it?
Sam Altman (love or him hate him, people listen to him) used the term “median human” to describe the expected quality of OpenAI’s systems: better than half the population, worse than the other half. Perfectly average.
This makes sense. Generative AI models are trained on human-created content and learn the conditional probability of such data. According to their internal representation of the data that trained them, anything that they create is the most probable outcome (thus the “median” behaviour). Give them enough data (how about the entire Internet?) and they will learn to create perfectly average Internet content.
If these very same models encounter a situation they’ve not seen in their training set, they will generate random (less probable) outputs that look “okay” to a non-expert but are utterly crazy for experts.
Not sure what I mean? Look at this chess game, between GPT-3 and a proper chess algorithm (not a large language model), Stockfish. This year-old example starts as a perfectly average game until—several moves into the game—ChatGPT encounters something it has never seen before and starts behaving very randomly (pay attention to one of the black knights).
My point? Generative AI systems are not only median (a synonym here is “mediocre”) but also start behaving unpredictably precisely when we have built enough trust in them (by watching their past performance).
The game of chess demonstrates it well. To a non-player, it looks like a perfectly decent game. To a beginner, the first several moves might even look smart. To an expert, the system starts as average, only to descend into madness.
This is also a perfect example showing that, for specific tasks, there are perfectly capable algorithms that outperform generative AI (Stockfish was released 15 years ago). Before using generative AI to solve a problem, ask if there’s a better way.
So, will generative AI models remain “average”? Or will they ever be able to outperform the human data that trained them?
The smartest person in the model is the model
I came across a paper titled “Transcendence: Generative Models Can Outperform The Experts That Train Them” by Edwin Zhang and colleagues from Harvard, UC Santa Barbara, and Princeton. It was released earlier this week. The study explores how generative models, when trained under specific conditions, can achieve capabilities surpassing the expertise they are trained on.
The authors built a system called ChessFormer (a “chess transformer”—the same transformer technology that hides behind the last “T” in “ChatGPT”), trained on human chess game transcripts. When evaluated under specific conditions such as low-temperature sampling (a trick to make model outcomes less random—typically bad for creativity but good for precision-demanding tasks), the model outperformed the highest-rated human players in the training dataset. In other words, the model became smarter than its teachers.
领英推荐
This phenomenon, which authors termed “transcendence,” suggests that generative AI systems can surpass individual expert performance by leveraging collective human expertise and minimising errors through the AI equivalent of majority voting.
This research underscores the potential for generative AI to not just mimic but exceed human expertise in specific domains. For now, it’s chess, but I am certain studies of other domains will follow. I acknowledge that this is just one paper so far, and it’s not peer-reviewed yet. Still, perhaps we should start questioning our perception of the role of generative AI in our creative and decision-making processes.
Let me spell it out: we’re starting to see early evidence of the ability of generative AI models to exceed the expertise from their training data. This is new in generative AI. And it’s eerily similar to what David Weinberger wrote twelve years ago about humans: “The smartest person in the room is the room”. Collective intelligence exceeds the intelligence of any individual.
In 2024, “The Smartest Person in the Model is the Model”.
The Smartest Person in the Model is the Model
But wait, there’s a catch.
The research uncovers an interesting insight into human expertise. Only models trained on diverse datasets, encompassing a wide range of player ratings and styles, performed significantly better than human experts. Such diversity allows the model to generalise and improve upon the individual performance of its trainers by minimising biases and errors.
Without enough human diversity, the model’s ability to outperform its training data dramatically diminishes.
What does it mean for business?
Wait, what if we trained future models on the outputs of these new, transcendent models?
I might need to give Ray Kurzweil a call.
See me live:
Recent podcasts I spoke at:
Prof. Marek Kowalkiewicz is a Professor and Chair in Digital Economy at QUT Business School. Listed among the Top 100 Global Thought Leaders in Artificial Intelligence by Thinkers360, Marek has led global innovation teams in Silicon Valley, was a Research Manager of SAP's Machine Learning lab in Singapore, a Global Research Program Lead at SAP Research, as well as a Research Fellow at Microsoft Research Asia. His newest book is called "The Economy of Algorithms: AI and the Rise of the Digital Minions".
Cybersecurity Business Analyst | Digital Transformation Business Analyst
8 个月Some great points in your post Prof. Marek Kowalkiewicz and aligned with my thinking about AI producing the median of responses. I think humans are interesting because of the edge/outliers. If the model can exceed expert performance due to the collective expertise it has been trained on, how will individuals validate the output if the reasoning is beyond their capabilities?
Head of Department, eLearning at Kelvin Grove State College
9 个月And for politics/government?
GAICD - Advisory in digital transformation, government services and social security
9 个月Hi Marek - love your work. The medium human concept is something we should all be aware of. Even experts have been known to make things up or perhaps say something is true when they know it may not be backed up by facts, when backed into a corner in terms of expressing their knowledge. As Gen AI hallucinates in these situations it will be a challenge for humans to detect (same as when the expert says something and we accept it as true even when we have our doubts). Will there be transparency built into Gen AI offerings with something like this coming up first in response to the prompt - "I think I am hallucinating on this one - enjoy the ride"
AI Management | Artist | PhD Candidate
9 个月Very interesting, Marek. There seem to be parallels between increasing human creativity and increasing machine creativity/ capability. For example, this reminds me of the creativity research on group brainstorming vs nominal pairs, where the pairs came up with better ideas because they weren't primed by the room. An interesting space for experimentation.
ai project management renewable energy
9 个月Q1 yes Q2 a lot and not just business but in any discipline. Decisions become transparent because we can question derivations like never before.