Steps Toward A Theory of Artificial General Intelligence

Steps Toward A Theory of Artificial General Intelligence

If we are to achieve artificial general intelligence, we will need an assessment of where we are today and what we will need to get there.? We need a theory of intelligence and a plan for achieving it.? I presented a sketch of a roadmap for achieving general intelligence and added additional notes. In this article, I want to add a few additions to that plan.

As Yogi Berra has said, “If you don’t know where you’re going, you’ll end up someplace else.” It seems foolish to attempt to build artificial general intelligence without at least some notion of what general intelligence is.? There is a widely assumed definition of general intelligence as a system that solves many problems or one that solves any problem that a human could solve.?

Simon and Newell, claimed in 1958:

“[T]he simplest way I [Simon] can summarize the situation is to say that there are now in the world machines that think, that learn, and that create. Moreover, their ability to do these things is going to increase rapidly until in a visible future–the range of problems they can handle will be coextensive with the range to which the human mind has been applied.”

Simon and Newell argued that their stated goal would be achieved within a decade, that is, by the end of the 1960s.? They were overly optimistic then and that optimism still is common.?

Part of what was lacking in their research and which seems still to be missing, is a theory of just what artificial general intelligence is.? To the extent that the current GenAI thought leaders have a theory, it seems to be:? general intelligence solves more problems than narrow intelligence and all we need are models with a sufficient level of computational capacity to achieve it.?

Artificial general intelligence is general because it solves many problems, but in the context of GenAI, the kind of problems that it solves are all of a single type.? In fact, they are all the same problem.? They appear to be different problems because some clever person(s) has figured out how to cast multiple nominal problems into a form that can be solved by a machine that guesses the next word.? But solving the same problem over and over is not an example of general intelligence.? It is narrow artificial intelligence. The generality is in the eyes of the designers and evaluators, not in the intelligence of the model.

A general intelligence, in addition, will have to do more than solve problems that are set out for it.? Robert Sternberg, in the context of human intelligence, suggests

Successful intelligence is defined as one’s ability to set and accomplish personally meaningful goals … .

In the context of computational intelligence, I think that the key part of this definition is that artificial general intelligence is autonomous in that it can recognize problems to solve and independently achieve complete solutions to them.? General intelligence includes the ability not only to accomplish goals, but to set them. Anyone with high school algebra can solve Einstein’s famous equation (E=MC2), but it took special intelligence to identify the need for, and then create, that equation.?

Much has been written about how current GenAI models make mistakes (hallucinations), how they follow language patterns, not fact patterns, and how they are stochastic parrots.? I will not recapitulate that evidence here.? Rather, I want to focus on how they learn because that has serious implications for understanding their capabilities and for what would be needed to achieve general intelligence.

GenAI models are derived from a transformer architecture.? They are trained by a kind of fill in the blanks approach.? A text or an image is presented, some of it is “masked,” that is removed, and the model is trained to fill in the missing pieces.? This approach forces the models to be hindsight driven.? Given a context (the input other than what is masked), the model is trained to fill in the blank according to what was there in the past.? It cannot create anything that is truly new.

That last sentence may require some explanation.? Hallucinations (so called) and other generated text and images frequently contain parts that have never been seen before.? For example, this video had surely never been seen before. But its “novel” parts are produced by the same model that produces the non-novel ones and the historical context on which it has been trained. The model does not represent directly the exact missing content.? A language model, for example, does not produce a single specific token in response to a context, rather it produces a distribution of potential tokens from which one is selected, as it has learned during training.? Because there are fewer parameters than possible context-token combinations, multiple contexts are represented in an overlapping pattern of model parameter values and multiple tokens could be produced from them.? Inexact quotations from the training text are the result of tokens sharing model parameters. These paraphrases are an intrinsic part of the ability of the model to generalize to contexts that had not previously been observed, but they are based solely on data that have been observed. Similar contexts predict similar tokens and once a token is generated, it becomes part of the context for the next token to be generated. The whole video linked to earlier has never been seen before, but its parts or something similar have been.

Some software developers find that language models can be useful to solve coding problems.? On closer examination, however, they have been found to do well on coding problems that were widely available when the model was trained (up to 2021 for GPT3). They do much more poorly when those problems were published after training was completed, further reinforcing the notion that these models are hindsight driven.? For example,

“ChatGPT’s ability to produce functional code for ‘easy’ coding problems dropped from 89 percent to 52 percent after 2021. And its ability to generate functional code for ‘hard’ problems dropped from 40 percent to 0.66 percent after this time as well.”

A similar result was reported by Roberts et al.? They found that programming problems that were more widely represented on the GitHub repository and were older (i.e., described before the training cutoff) were better solved than less well-covered problems.? Similarly, Udandaro et al. found that exponentially larger data sets were necessary to produce linear improvements in model performance.

Human intelligence is not limited to expressions of already known relationships.? Humans invent new knowledge. They develop new ways of thinking about old problems that go against the commonly held descriptions.? The best examples of these are in science, not because scientists’ intelligence is fundamentally different from other forms of human intelligence, but because the examples are better documented.? Human scientists do not think in fundamentally different ways from nonscientists, they just think about some different things.? They may be more systematic in their thinking (but not always), but they are still just people.

Charles Darwin’s theory of evolution by random variation and natural selection contradicted the scientific writing of his time.? It did not follow the well-laid down language patterns of his community. The dominant idea at the time was that species were fixed and never changed.? Although there was some consideration of the “transmutation” of species, this notion was generally rejected.? Darwin not only contradicted the dominant scientific thinking of the time, he offered mechanism by which evolution would occur.? It was a different way of viewing species, not just a stochastic reproduction of the dominant writing on the topic.?

Einstein’s theory of the photoelectric effect was similarly revolutionary.? The dominant writing at the time recognized that light consisted of waves, but Einstein reconceptualized it as particles (as well as waves) and made radical predictions that the frequency, and not the intensity, determined the current produced by light striking an object.

Thomas Kuhn (1962; ) has written about how scientific revolutions replace one way of thinking with another.? Kuhn distinguished between what he called “ordinary science” and “revolutionary science.”? Ordinary science is incremental, it may advance what we know about the world, but revolutionary science changes the way we conceptualize it. Most scientific activity is solidly in the ordinary science category.? Most AI publications also fall under ordinary science.? Can a tweak to this or that catapult the author to the top of a leaderboard?

Do not confuse the revolutionary notion of genius with big science.? Reconceptualization happens at many scales.? When a man realizes that his dates never work out because he chooses potential partners according to the wrong criteria and determines to change them, that is also an act of revolutionary genius.

Here is a sequence of integers.? What integers would logically follow?? 8 5 4 9 1 7 … Hint: it involves a reconceptualization of the series.? I will provide the answer and an explanation later. The answer can be found using Google or other search engine, so ChatGPT could probably answer it, because the pattern is already available.? Can you answer it without looking it up?

Until now, these acts of reconceptualization have been limited to humans.? Every significant improvement in computational intelligence started with some human reconceptualization of the nature of intelligent computation. So far, “intelligent” machines have been able to solve the equations that they have been given, but not to create new ones.? It was somewhat revolutionary to conceive of deep learning, and maybe almost revolutionary to think that one could train a model on virtually everything that was publicly available in digital form. These were inventions of human intelligence.

Another consequence of language modeling is its limited ontology.? Language models have exactly two kinds of entities: tokens and connections. ?We don’t have a good vocabulary for talking about these things.? Whatever words I could use to describe the content of a model ?are overloaded with meaning derived from human experience.? The scare quotes in the next paragraphs are intended to signal that the term is being used as a kind of metaphor.

The only things that language models can “think” about are tokens and their connections and the only way that they can “think” about them is with tokens and their connections. They “know” nothing about anything else.?

GenAI models do not reason, they mimic reasoning. “Truth” is just another word, not a state of the model.? Tokens do not have meaning because there is no way to express their meaning within the language model.? Humans can see that words with similar meanings have similar representations (embedding) within the model, but the model only “knows” that the embeddings are similar; it does not “know” what they mean.

The meaning or reasoning that some people attribute to these models is contributed by humans as the training text was written and as the output of the models is read.? The machine paraphrases word patterns contained in the training set. That is all it does and all it can do.? Nothing else exists “for the model.”

Some humans may find it challenging to understand how limited a language model can be.? Three analogies might help.? A scanned image of a document contains only the positions of pixels and their brightness (or three brightnesses for a color image).? The scanner cannot “think” of anything else.? An OCR program, can determine the letters that best match that pattern of pixels.? It can then “think” about the letters on the page.?

In the 1884 novella “Flatland,” Edwin Abbott Abbott imagines a world that consists of only two dimensions.? The “people” in this world are lines and polygons.? They cannot imagine the possibility that there might be a third dimension.? When visited by a sphere from a 3-dimensional world, they see it only as a disc.? The sphere, of course, can see the limited dimensions of flatland, but the flatlanders cannot conceive of the possibility of three dimensional objects such as spheres, cubes, or pyramids.?

In the 1971 novel “Being There,” Jerzy Kosinski imagines a man, Chance, the gardener, who has grown up his entire life sheltered in the garden of a house owned by an “Old Man.” When the Old Man dies, Chance is forced out into the world and through a series of mishaps ends up in the home of another rich man where his simple-minded words are interpreted by the US President, the press, and others as being profound.

The scanner “knows” nothing of letters, the Flatlanders “know” nothing of a third dimension and Chance, the gardener, knows nothing of economics.? In all three examples, an outside entity, and the humans reading about them, knows that there is more and can relate the limited knowledge of the scanner, the Flatlanders, and Chance to this richer information. In all three cases, human observers can attribute this extra knowledge to the limited system, but that attribution is in the observer, not in the system.

So where does that leave us in the quest for general intelligence?

General intelligence includes forward looking capabilities.? Intelligence is not just reactive, waiting for others to set problems for it to solve.? Intelligence is also active in shaping and selecting its situations. Intelligent individuals do not simply respond to puzzles and problems; they actively seek them. ?They may work to structure their environments to make it easier to address their issues or work to restructure problems. Intelligence includes the ability to set and accomplish meaningful goals. Intelligent people can recognize the existence of a problem, define its nature, and represent it. They can recognize where knowledge is lacking and work to obtain that knowledge. Although intelligent people benefit from structured instructions, they are also capable of seeking out their own sources of information.

Intelligent people find new relations among things and particularly, new useful relations. The mathematician and philosopher Henri Poincare wrote about the process of mathematical discovery.?

“In fact, what is mathematical creation? It does not consist in making new combinations with mathematical entities already known. Any one could do that, but the combinations so made would be infinite in number and most of them absolutely without interest.”

The interesting combinations “are those which reveal to us unsuspected kinship between other facts, long known, but wrongly believed to be strangers to one another.”

“For fifteen days I strove to prove that there could not be any functions like those I have since called Fuchsian functions. I was then very ignorant; every day I seated myself at my work table, stayed an hour or two, tried a great number of combinations and reached no results. One evening, contrary to my custom, I drank black coffee and could not sleep. Ideas rose in crowds; I felt them collide until pairs interlocked, so to speak, making a stable combination. By the next morning I had established the existence of a class of Fuchsian functions, those which come from the hypergeometric series; I had only to write out the results, which took but a few hours.” (Poincare, 2013; Wagenmakers, 2022).

Poincare makes several points relevant to general intelligence. ?Random selection of well known entities would require the consideration of too many possibilities.? The useful ones are those that reveal novel relations between entities whose relations had not previously been known. As an intelligent person, Poincare was able to select candidate entities for consideration in “crowds” and then compare the members of the crowd in more detail.

Poincare's description sounds a lot like approximate nearest neighbor search. A crowd of candidates is retrieved and then filtered. Computational models of approximate nearest neighbor work because the designer of such systems predefines the complete set of features and values by which the objects can be compared.? But without a designer, items have a potentially infinite number of features.? Animals can be compared by how they look, their relation to humans, where they live, that they have skin, that they have more than one hair, by whether they have been written about by Charles Darwin, etc. Although the set of all possible features is infinite, humans tend to consider some more frequently than others.? They rarely, for example, organize animals by whether they have a left aortic arch, but frequently organize animals by whether they have fur or feathers.

Even limiting the search space to mathematical functions, as Poincare did, would still lead to too many possibilities to consider in a lifetime. Instead of considering just any features of these functions, his search focused on selected ones that were apparently less typically associated with these functions. The unsolved problem then would be how to pick those features that should be the target of this comparison.? One possibility is suggested by a theory described by William Estes called stimulus sampling theory. ?The basic idea is that a stochastic process might choose a sample of dimensions for comparison.

Traditional programming specified directly the relationship between the inputs and the desired outputs.? Narrow AI is more flexible, in that it sets the space of potential solutions and the space of available tools to achieve them.? In neural network systems the space of potential solutions is the set of parameters and how they are organized (for example in layers with specific activation functions). The tool used to solve the problem is usually some form of gradient descent where it is possible to determine whether any particular potential change results in a closer approximation to the solution.? By analogy, they learn by playing “hotter,” “colder.” They can learn to solve problems where it is possible to measure being closer to the solution, but not every problem can be solved by successive approximation.? Consider the integer sequence problem described earlier. The digits are arranged by the first letter of the English name of the digit.? (8 5 4 9 1 7 6 3 2 0) Once one realizes that the digits are to be ordered by their English name, the solution to the problem is immediately obvious. No successive approximation is needed, and may not even be possible. General intelligence requires the ability to create its own solution space, change how it represents a problem, and change the way that it solves it.

General intelligence requires the ability to specify its own problems.? When does a situation require a solution? What would that solution look like?

Researchers and developers have become vey successful at building narrow artificial intelligence solutions that can solve certain problems at suprahuman levels, at least as measured by benchmarks.? Many of them have been less successful in use beyond the benchmarks (e.g., radiology). ?The bigger point is that this work has investigated the performance of these systems in relatively constrained, well-structured, situations where the researchers and developers have specified the problems, the representations, the desired outputs, and so on. The people solved all of the difficult parts of the problem, leaving only the easy parts for the AI system to solve.? A general intelligence will need to be able to supply all of these parts or it cannot be considered autonomous.

A theory of artificial general intelligence will need to address all parts of problem solving process, including problem selection, solution planning, as well as a means of achieving these parts. This work will require research along lines that have not yet been explored, but I think that we are beginning to see what such a theory would look like.?

Roumen Popov

DSP Software Engineer

2 个月

"We need a theory of intelligence and a plan for achieving it." - exactly! The problem is that this is very difficult to achieve and so people have opted instead for beating benchmarks because that's easier. We are basically in the situation where we continue looking for our keys under the lamp post even though we very well know we lost them in grass.

Transformers and LLMs do not think OpenAI has said this outright and consider it there current problem to build AI that can reason. What we have is something that can apply attention to a sentence use a encoder to deduce meaning and supply a output through the decoder. If it has memory it is not a continuous recursive spark of AGI it is just sparks of intelligence. Yes one day we might uses those sparks to set alight the whole bonfire but today it’s still just a more smarter search engine.

Very interesting. My view is that we are not going to be able to develop (in the sense of coding) Artificial Intelligence. What we could try, though, is creating the conditions for the evolution of artificial life, which would eventually become intelligent: https://www.dhirubhai.net/posts/ed-moman-phd-3b4632294_google-researchers-say-they-simulated-the-activity-7219308212712345600-KnIW?utm_source=share&utm_medium=member_desktop Some sort of algorithmic protozoa that evolve in changing environments by mutations, chimerisms, etc. Such has been my proposal for the last eight years and it seems that, finally, is being put to the test.

Richard Pferdner

VP Architect at Citi focused on building excellence at scale

2 个月

Great article. This inspired me to write an article about some points from your article. Thank you for your thought leadership in this field. https://www.dhirubhai.net/pulse/gen-ai-theory-practice-richard-pferdner-qi14c

回复
Woodley B. Preucil, CFA

Senior Managing Director

2 个月

Herbert Roitblat Very insightful. Thank you for sharing

回复

要查看或添加评论,请登录

Herbert Roitblat的更多文章

社区洞察

其他会员也浏览了