LLMs And The AGI Threshold
Reid Hoffman
Co-Founder, LinkedIn & Inflection AI. Author of Superagency. Investor at Greylock.
In a new essay at Noema, Blaise Agüera y Arcas and Peter Norvig make the case that ChatGPT and other frontier large language models like Bard, LLaMA, Claude, and Pi will eventually "be recognized as the first true examples of AGI," or artificial general intelligence.?
Is this just more industry hype? Or, if you believe that AGI automatically equates to the robocalypse, unfounded doom-mongering?
Bluntly, it's neither. Instead, Blaise and Peter made what strikes me as a measured and even under-stated case for why, even with all the hype that LLMs have generated over the last year, we're not sufficiently acknowledging how much progress they've actually made.?
Before I dive into why I think that, I should note that I'm friends with Blaise, who is a Fellow at Google Research. And, Peter is a Fellow at the Stanford Institute for Human-Centered AI, or Stanford HAI. (I, in turn, am on Stanford HAI's board, and have provided funding for a grant program there that supports projects designed to understand the human and societal impacts of AI, augment human capabilities, and develop AI technologies inspired by human intelligence.)
As Blaise and Peter are quick to acknowledge at the beginning of their essay, AGI "means many different things to different people." What they leave implicit, until much deeper in the essay, is that for many people, this basically equates to, "Can machines think like we humans do, approximating all our different kinds of cognition? Are they sentient? Do they have the capacity to engage in logical inference and abstract thinking?"
In contrast, Blaise and Peter put their focus on what machines are doing, rather than how they're doing it.?
And what current frontier models are doing, at the most fundamental level, is exhibiting increasingly powerful generalizability. Unlike previous generations of narrowly trained LLMs, these new models can perform a wide range of functions, drawing on a wide range of data types, across a wide range of topic areas, all without having been explicitly trained to do these things.?
So, for example, they can handle translations for language pairs for which they've never been directly trained, like translating Urdu to Welsh. Or they could create a list of thematically appropriate songs to play while translating Urdu to Welsh. Or, to use an example from deep-learning pioneer Geoffrey Hinton, they can deftly summarize all the ways a compost heap is similar to a nuclear bomb.?
Blaise and Peter additionally note that this significant "upgrade" to LLM capabilities has gone largely unacknowledged by deep-learning skeptics like cognitive psychologist and computer scientist Gary Marcus, whose criticisms of LLM models have stayed pretty much unchanged over time, even as LLM capabilities continue to show major, measurable improvement.
Of course, skeptics like Gary have an obvious rebuttal. Their criticisms haven't changed because even these frontier models, adept as they may be, keep making the same kinds of errors that earlier models made. To that end, they have some trouble making logical inferences or engaging in abstract thinking. They make up falsehoods when ostensibly providing factual information. Common-sense reasoning challenges can still trip them up.
This is certainly true, but also not the whole story. And at this moment in time it's definitely not the main story.????
For a more concrete look at what I'm getting at here, check out this example from Stanford HAI's Artificial Intelligence Index Report 2023, which tests three different versions of GPT with the same exact prompt:
As the Artificial Intelligence Index Report 2023 notes, the 2019 version of GPT mostly produces "gibberish." The sentences are grammatically constructed, but are not particularly coherent with each other nor contextually responsive to the prompt.?
In contrast, the November 2022 version produces an output that is relevant, thorough, and except for one minor detail, accurate. It's like a hang glider has evolved into an Airbus 350.?
领英推荐
And that was the November 2022 version of ChatGPT. The current version is still more capable.?
Of course, these improvements aren't happening because ChatGPT has somehow acquired the capacity for common-sense reasoning, or theory of mind, or an ability to create world models that allow it to understand a given context or environment and thus engage in more informed decision-making or problem-solving. It's still just functioning as an extremely sophisticated generative pattern-matching tool generating texts based on statistical relationships it has learned from its training data.?
So when frontier models like GPT-4 ace the SATs or the Certified Sommelier exam, or discourse more reliably on the major accomplishments of Teddy Roosevelt, it's not because they've acquired new human-like capabilities of cognition. They're still just “pattern-matching”, but at higher levels because of how developers are training them on increasingly massive datasets and scaling up their complexity by increasing the number of their parameters.?
But what if this extraordinary pattern-matching ability alone, and all the capabilities that spring from it, constitute its own kind of general intelligence, regardless of how well that intelligence might correlate with human intelligence??
In their essay, Blaise and Peter quote the linguist Noam Chomsky on LLMs: "We know from the science of linguistics and the philosophy of knowledge that they differ profoundly from how humans reason and use language. These differences place significant limitations on what these programs can do, encoding them with ineradicable defects.”
At a certain point of facility, though, don't the ineradicable defects become insignificant defects as well??
Again, as Blaise and Peter emphasize, it's the what, not the how.
If ChatGPT can expertly pair a New Zealand Sauvignon Blanc with a dish that complements its crisp acidity and pronounced fruit flavors 99.99% of the time, it's doesn't really matter that it doesn't actually know what Sauvignon Blanc tastes like, or even really where New Zealand is.
Obviously, different contexts warrant different levels of proficiency: An LLM error rate we'd tolerate for wine recommendations won't meet the standard in other higher-stakes scenarios. But that's true for humans too, who have their own significant limitations and ineradicable defects.? After all, there’s still some human beings who think it’s fake news that we have landed human beings on the moon.??
Meanwhile, as Blaise and Peter write, we also "have a growing wealth of tests assessing many dimensions of [LLM] intelligence." And yet skeptics like Chomsky and Gary treat any common-sense reasoning error or abstract thinking error that LLMs commit as uniformly disqualifying, because they're evidence that, for all their remarkable facility across an expanding range of tasks, LLMs still aren't processing information in the same way that humans do.?
In contrast, what Blaise and Peter are noting is that human intelligence shouldn't necessarily set the standard for "general intelligence." There are a wide range of functions approximating human cognition that the current generation of frontier LLMs can perform, in increasingly agentic ways, at levels comparable or exceeding to what humans can do. And from an empirical perspective, that qualifies them as "generally intelligent."
Given all the expectations, fears, and aspirations we've been piling onto AGIs for decades now, this might seem like a modest claim. But it strikes me as astute and productive. Focusing on capability rather than methodology doesn't erase or excuse the limitations of these systems – but it does orient us toward leveraging their strengths in ways that can accelerate innovation and their beneficial impacts.?
Indeed, while it would certainly be a mistake to assess LLMs only by what they do well, it seems equally misguided to assess them by only what they do poorly. That's especially true at this current moment, when what they do well is increasing so rapidly. Sometimes, making a molehill out of a mountain can be a bigger obstacle to progress than the reverse.
??CEO, evyAI -AI LinkedIn? Trainer, Business Development Training B2B Marketing via Ajax Union // Networking Connector, Author, Speaker, Entrepreneur, AI Expert, Single Father????????????
1 年Great post, Reid! I completely agree that while ChatGPT and similar technologies have received a lot of attention, we shouldn't overlook the advancements in frontier LLMs and their potential impact on AGI. It's important to recognize the progress being made and the implications it may have. Looking forward to further discussions on this topic!
DSP Software Engineer
1 年I don't understand how people can talk about AGI when LLMs can not learn even simple arithmetic, I mean c'mon, what sort of AGI is that if it is incapable of learning how to add two numbers even after millions of examples. For me this is a huge elephant in the room, which can be ignored only on purpose.
Partners/Investors carbon farm hemp - Eco education - invest in carbon capture. - blog GroupsStartup.net/blog/Thoughts
1 年Even LinkedIn are scared of the fact checking apparently? Is that greed or just an algorithm blocking.
Fluid Identity | Unbreakable Visual Passwords | No Typing
1 年Keep us posted when an LLM asks a "why" question ??
Principal, CEO, Global Keynoter | Named One of "24 Americans Changing the World" by Business Insider | Leader of Transformative Change in Turbulent Environments Involving Tech, Data, & People
1 年Greetings Reid Hoffman, thank you for this thought-provoking post. What if the Turing test is the wrong test (a machine attempting to deceive a human into thinking its human) - when we should be asking how AI can help make human and machine decisions, work, and lives better than they were? Which of course requires us to examine how well our current decision-making processes in business, governance, and societies are to start? More here on how we might be repeating some of the same social behaviors of the 1890s and 1900s: https://www.dhirubhai.net/pulse/those-who-forget-history-ai-post-linkedin-ate-david-bray-phd-tq1wc Michael Nelson Michaela Iorga, PhD R "Ray" Wang Terry Bollinger @David Hsing Chuck Brooks Sharron L McPherson Dana W. Hudson Derry Goberdhansingh Bevon Moore Mei Lin Fung Anthony Scriffignano, Ph.D.