Data size, not model size, leads to AI
Since the beginning of artificial intelligence, there have been predictions that artificial general intelligence and superintelligence, are just around the corner. Recently, Nvidia's CEO claimed that it would be achieved within the next 5 years. https://techcrunch.com/2024/03/19/agi-and-hallucinations/ Elon Musk predicted that by next year or the year after, we will see artificial intelligence that exceeds the smartest human. https://www.reuters.com/technology/teslas-musk-predicts-ai-will-be-smarter-than-smartest-human-next-year-2024-04-08/.
There are many reasons to be skeptical of such claims, not least that they have been made since 1956.
The idea of general intelligence is that a computer should be able to perform any cognitive task that a human could. The idea of superintelligence is that (following Good, 1966), an intelligent computer will be able to improve its own capabilities and because it is much faster than people are, it will do so an an exponentialy explosive rate. Despite recent advances, it would be a mistake to extrapolate from them to imminent general artificial intelligence. The problems that have been solved are not representative of the broad range of problems that general intelligence would need to solve (e.g., https://www.nature.com/articles/s41587-023-02103-0 and https://thereader.mitpress.mit.edu/ai-insight-problems-quirks-human-intelligence/).
The current versions of this argument also depend on the idea that language fluency is the essential feature of intelligence (see https://www.dhirubhai.net/pulse/essay-concerning-machine-understanding-herbert-roitblat-r6amc) and on the idea that larger models, with more parameter, will spontaneously yield deeper cognitive functions. Once enough parameters are combined into a single computation, then general intelligence will be achieved as an emergent property of this model. In short, model scaling is enough.
There are many flaws with the preceding argument about generative AI becoming general AI. Here, I want to focus on the scaling argument. It is true that larger language models generally get better at solving certain problems than smaller models. GPT 3.5 has 175 billion parameters and GPT 4 has 1.76 trillion parameters, but for most tasks, it is only marginally better. Even this argument for progress is flawed, however, in that it neglects that the training set for GPT 4 consisted of 13 trillion tokens compared to 499 billion for GPT 3, a 26-fold increase. A recent paper (https://arxiv.org/abs/2404.04125 by Udandarao, et al. 2024) suggests that the increased size of the training set matters more than the increased number of parameters in the language model.
领英推荐
First, the larger the training set, the more likely it is that it includes the very tests against which model progress is being measured. The intelligence of these models cannot be measured directly, it is assessed against certain benchmarks. Practically every benchmark for measuring artificial intelligence progress has been written about in the world-wide web, so, the larger the training set, the more likely that the test has been included in it. The model could learn the answers to the questions used in the evaluation rather than learning the skill that it supposedly evaluates. The existing benchmarks may not, therefore, be valid indicators of any kind of intelligence beyond memorization. Any claims based on these tests are likely to be bogus.
Udandarao et al. found instead, that models require exponentially more data to produce linear increases in performance. “This trend persists even when controlling for sample-level similarity between pretraining and downstream datasets, and testing on purely synthetic data distributions.”
If their analysis is correct, it would argue very strongly against the possibility of near term artificial general intelligence and argue against the possibility of super-intelligence. The intelligence of a model depends strongly on the size and content of its training set. To be general, intelligence would have to be able to create novel patterns, not just recognize existing ones. Our current models of AI, far from being even suggestive of artificial general intelligence, are stochastic parrots, that without some new outside intelligence yet to be invented, are limited to hallucinations, some of which may be useful.
Unibase database, language, AI and semantic data models. Data modeller, cyber security, custom applications
11 个月Have these models considered forgetting? People don't remember everything. In fact of the countless things a person sees, hears, encounters on a daily basis most of them are forgotten or remembered poorly. Surely this is an important part of human intelligence?
Illuminating your path to innovative thinking, a future-proof mindset, and leadership prowess. | An international speaker & consultant. | TED Speaker | TV Villain
11 个月The following line that I've used in several keynotes and articles (likely an amalgam of other quotes) I believe parallels what you've been saying: "If you have the same experiences as everyone around you, if you go to the same industry events, read the same books and trade magazines as everyone around you. If you work in the same field, talk to the same consultants, studied in the same universities or learned from the same textbooks as everyone around you. If you talk only with other like-minded people you’re likely to think the same way as everyone around you and have the same ideas as everyone around you. If you are thinking what everyone else is thinking, you are only contributing to the average." GenAI has read almost all the books and all the content. It is good at regurgitation, not origination. It appears to be getting smarter because it continues to consume more human-created content. What is not happening is a narrowing of the gap between human intelligence and computer intelligence. True General AI will first eliminate the gap, and then invert it.
Attorney, AI Whisperer, Open to work as independent Board member of for-profit corps. Business, Emp. & Lit. experience, all industries. Losey.ai - CEO ** e-DiscoveryTeam.com
11 个月As Clinton said, it all depends on what “is” means. One person‘s AGI is another’s boring encounter.