Fantasies and Hallucinations on a theme by I.J. Good
A chart, prepared by a former OpenAI employee, Leopold Aschenbrenner has been making the rounds among the AI community. This chart purports to show that artificial general intelligence will be achieved in 2027. ?According to Aschenbrenner, “AGI by 2027 is strikingly plausible,” and “That doesn’t require believing in sci-fi; it just requires believing in straight lines in a graph.”? Furthermore, he claims, this graph is based on “public estimates.”
I was going to write a longer piece about this graph, but the issues with it are important enough to call attention to them now.? This graph has received more than 4 million views, but it is full of potentially dangerous misinformation.? If it were drawn by a GenAI model, we would call it a hallucination.
Three axes are drawn on the graph, but only one of them has a basis in fact.? We can all agree about a what a year means.? The left-hand axis is labeled “effective compute,” which according to Aschenbrenner, refers to “both physical compute and algorithmic efficiencies” (https://situational-awareness.ai/wp-content/uploads/2024/06/situationalawareness.pdf).? By algorithmic efficiencies, he appears to mean something like this: “Gemini 1.5 Flash is ~10x faster than the originally released GPT-4, merely a year later, while providing similar performance to the originally-released GPT-4 on reasoning benchmarks. If that’s the algorithmic speedup a few hundred human researchers can find in a year, the automated AI researchers,” which he proposes will be available in 2027, “will be able to find similar wins very quickly.” ?Following this hint, “effective compute” capacity means adding more computations or getting more performance per computation.? More effective compute capacity is observed when a model does better on certain benchmarks or does similarly with fewer operations (measured in FLOPs; FLoating point OPerations).
According to his graph, effective compute capacity has been rising by a total of seven orders of magnitude since 2018. ?Measured in training FLOPs, the increase is about 4 orders of magnitude (https://www.makeuseof.com/gpt-models-explained-and-compared/), so the other thousand-fold increase must come from unnoticed improvements in algorithmic efficiency.
Aschenbrenner’s argument for a continuing straight-line improvement in effective compute rests on the discovery by humans of similar improvements of the same magnitude.? Such discoveries are difficult to predict and there is no reason to expect that even if they do occur, they will provide the same magnitude of improvement attributed to the recent past.? A straight-line improvement in algorithmic efficiency would drive the number of FLOPs required for training to 0.0 or below, which makes no sense.?
The right-hand axis is practically meaningless.? There is no scale.? His estimates of relative intelligence are empty.? For example, by the intelligence of a preschooler Aschenbrenner means “GPT-2 was shocking for its command of language, and its ability to occasionally generate a semi-cohesive paragraph, or occasionally answer simple factual questions correctly. It’s what would have been impressive for a preschooler.”? That is a dramatically poor summary of a preschooler’s intelligence.
Similarly, his estimate near the top of graph for the intelligence of a software engineer is equally flawed.? “And the job of an AI researcher is fairly straightforward, in the grand scheme of things: read ML literature and come up with new questions or ideas, implement experiments to test those ideas, interpret the results, and repeat. This all seems squarely in the domain where simple extrapolations of current AI capabilities could easily take us to or beyond the levels of the best humans by the end of 2027.”? In fact, however, Aschenbrenner has no idea of how a word guessing model could accomplish such things. If it were within the range of things that current models could do it would reduce to this prompt:
Write a function that will improve the efficiency of a transformer model by a ten-million-fold.
Or maybe:
Write a function that can write a function to improve the efficiency of a transformer model by a ten-million-fold.
How does a language model, that represents the past probabilities of words given a context, come up with NEW questions or ideas?? How will it prioritize those questions?? Will it choose them randomly? How will it evaluate those experiments?? There are no benchmarks for such problems.? Such a metamodel might be able to assess whether it made another model more efficient, but will making it more efficient make it more intelligent?? How would the model measure that?
Here are a few of the bad assumptions encapsulated in Aschenbrenner’s prospectus.
·????? Fluency is the same as competence.
领英推荐
·????? Modeling of past language is sufficient for intelligence.
·????? Reasoning benchmarks actually measure reasoning (not just modeling past discussion of reasoning).
·????? Computational capacity is intelligence.
·????? Cognitive processes emerge spontaneously with increasing computational capacity.
·????? All problems can be solved algorithmically.
·????? Language models improve because of increases in effective compute capacity, not because of increases in the breadth and depth of training material.
·????? No new data (or least not much new text) will be needed to solve these problems, only computer science literature and the results of the models’ experiments.
·????? Constructing a model capable of improving language models will be sufficient to create artificial general intelligence.
Aschenbrenner’s prospectus mirrors the speculations from I. J. Good in 1965:?
“Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an 'intelligence explosion,' and the intelligence of man would be left far behind... “
Like Good, Aschenbrenner assumes that improving programming is the most important part of building artificial intelligence. But he fails to note that every increase in the computational capacity of a large language model has been accompanied by a comparable increase in the breadth of the training material presented to it. ?The functions that have changed over the last few years do not directly improve the “intelligence” (however defined) of the models, they improve the efficiency of the computations, enabling improved modeling of the training language.? The benchmarks against which the language modeled are measured are included in the newer training sets so the validity as measures of such functions as reasoning is suspect.
Finally, even if there was success in building a machine to automate AI research by 2027, it still would not be a general intelligence.? It would be yet another narrow intelligence, taking whatever shortcuts were available to implement one function. We may achieve artificial general intelligence someday, but language models and their relatives will not be sufficient to get there.
Applied Research Scientist
9 个月https://m.youtube.com/watch?v=e6LOWKVq5sQ
Data and Analytics Specialist/ eDiscovery Platform Expert/ Digital Forensics/Data and IT Infrastructure/Info Gov/Legal Ops/ESI Sherpa/ Bellwether
9 个月The first 80% is easy. The last 20 will require an entirely different path than what we are on know. This tech is already over 10 years old, it's only now becoming slightly useful. There is still no actual thought being done, and the hallucinations are anything but. Simply put there predesigned randomization built into the tokenization system. it doesn't know what a word means, only recognizes the pattern of words. And it only works if it training set is mostly correct in the first place.
Attorney, AI Whisperer, Open to work as independent Board member of for-profit corps. Business, Emp. & Lit. experience, all industries. Losey.ai - CEO ** e-DiscoveryTeam.com
9 个月Excellent debunk.
Co-CEO & CTO v Xolution | ?? Inovátor roka
9 个月GPT made a huge impression on me. But at the time of its greatest success, I felt like Alice in Wonderland. I didn't understand what I was missing and no one could explain it to me. That ordinary people were crazied I can understand, but AI specialists in our region said that NLPers (etc) should learn to work with their hands because their work no longer made sense? One government officier told me that trying to achieve AGI was futile because it already exists and has it from good "professional" sources. With my mouth open, I asked who on earth developed it and when. Of course he meant OpenAI... Well today, same topic and I still have the same Alice in Wonderland feeling. Either my model of the world is flawed or I'm intellectually no longer ready play this game ??. Thank you for a great article.