Are we on the way to Artificial General Intelligence (AGI)?
Arun Krishnan
Entrepreneur, Technology Leader, Business Leader, Analytics, AI, GenAI, Author Experienced Data Science and AI professional and leader. Driving Data Science, AI and GenAI technology and business growth
A recent paper (157 pages long) titled "Sparks of Artificial General Intelligence: Early experiments with GPT-4 ", published by Microsoft Research makes for very interesting reading. I have gone through the entire report and it IS TL;DR so I shall try and summarize the findings here for you.
But before that, let's take a brief detour into what intelligence actually means. One of the definitions is that intelligence is a multifaceted and complex cognitive ability that involves the capacity to understand, learn, reason, solve problems, adapt to new situations, think critically, and apply knowledge to different domains.
But what does it mean to say that an artificial intelligence system is intelligent?
From the paper:
In the late-1990s and into the 2000s, there were increasing calls for developing more general AI systems and scholarship in the field has sought to identify principles that might underly more generally intelligent systems . The phrase, “artificial general intelligence” (AGI), was popularized in the early-2000s to emphasize the aspiration of moving from the “narrow AI”, as demonstrated in the focused, real-world applications being developed, to broader notions of intelligence, harkening back to the long-term aspirations and dreams of earlier AI research. We use AGI to refer to systems that demonstrate broad capabilities of intelligence, including reasoning, planning, and the ability to learn from experience, and with these capabilities at or above human-level.??
There are multiple other definitions of AGI:
In the paper, the authors propose an approach to study whether GPT-4 is making progress towards AGI by sticking closer to traditional psychology. With that aim, they generated tasks and questions that were meant to push GPT-4 beyond mere memorization and looked at testing GPT-4 across the following areas:
Having done this, as the authors say
Can one reasonably say that a system that passes exams for software engineering candidates is not really intelligent? Perhaps the only real test of understanding is whether one can produce new knowledge, such as proving new mathematical theorems, a feat that currently remains out of reach for LLMs.?
Some interesting findings
A key measure of intelligence is the ability to synthesize information from different domains or modalities and the capacity to apply knowledge and skills across different contexts or disciplines.??
GPT-4 did some remarkable things like:
领英推荐
There are numerous other examples of the model being able to combine diverse disciplines to come up with a seemingly bewildering welter of complexity.
So where there areas where it faltered?
Yes, most certainly. For example, with music. Read for yourself from the paper:
When instructed to generate a short tune (Figure 2.9), and the model was able to produce valid ABC notation. The tune had a clear structure, the time signature was consistent between bars and the notes followed increasing and decreasing patterns. The tune also used a consistent set of notes within the melody, and the rhythm had a repetitive pattern. However, the model did not seem to obtain the skill of understanding harmony. In fact, consecutive notes in the generated tunes are almost always adjacent to each other (namely, the note following C will almost typically be either B or D), and testing on 10 generated tunes, we were not able to extract any clear chords or arpeggios.
We then asked the model to describe the tune in musical terms. It was able to successfully give a technical description of the structure in terms of repetitions, the rising or descending parts of the melody and to some extent the rhythm. However, it seems that the descriptions of the harmony and chords are not consistent with the notes (in fact, it refers to sequences of adjacent notes, which do not form valid chords, as arpeggios).
In short, the model failed at any non-trivial form of harmony. As an amateur musician, I find this VERY interesting. Does harmony actually require a higher order of intelligence?
GPT-4 was able to carry out coding tasks fairly well. Not just at coming up with code, but also in understanding existing pieces of code and reverse-engineering assembly code.
However the model failed at more complicated mathematical problems, since, being a language model, it is context-dependent. The authors look at three aspects of mathematical understanding, viz.,
What explains the limitations of GPT-4?
The authors attribute it to two main issues:
Since LLMs are based on the next-character or next-word prediction paradigms, their inherent limitations also impact on their generalizability to AGI.
From the paper:
These manifest as the model's lack of planning, working memory, ability to backtrack and reasoning abilities. The model relies on a local and greedy process of generating the next word, without any global or deep understanding of the task or the output. Thus, the model is good at producing fluent and coherent texts, but has limitations with regards to solving complex or creative problems which cannot be approached in a sequential manner. This points to the distinction between two types of intellectual tasks:?
Incremental tasks. These are tasks which can be solved in a gradual or continuous way, by adding one word or sentence at a time that constitutes progress in the direction of the solution. Those tasks can be solved via content generation which does not require any major conceptual shifts or insights, but rather relies on applying existing knowledge and skills to the given topic or problem. Examples of incremental tasks are writing a summary of a text, answering factual questions, composing a poem based on a given rhyme scheme, or solving a math problem that follows a standard procedure.?
Discontinuous tasks. These are tasks where the content generation cannot be done in a gradual or continuous way, but instead requires a certain ”Eureka” idea that accounts for a discontinuous leap in the progress towards the solution of the task. The content generation involves discovering or inventing a new way of looking at or framing the problem, that enables the generation of the rest of the content. Examples of discontinuous tasks are solving a math problem that requires a novel or creative application of a formula, writing a joke or a riddle, coming up with a scientific hypothesis or a philosophical argument, or creating a new genre or style of writing.?
Interestingly, the authors borrow from Nobel Laureate Daniel Kahneman's seminal work on the fast and slow modes of thinking. Read for yourself as to what they say:
One possible way to interpret these limitations is to draw an analogy between the model and the concepts of fast and slow thinking, as proposed by Kahneman. Fast thinking is a mode of thinking that is automatic, intuitive, and effortless, but also prone to errors and biases. Slow thinking is a mode of thinking that is controlled, rational, and effortful, but also more accurate and reliable. Kahneman argues that human cognition is a mixture of these two modes of thinking, and that we often rely on fast thinking when we should use slow thinking, or vice versa. The model can be seen as able to perform “fast thinking” operations to a very impressive extent, but is missing the “slow thinking” component which oversees the thought process, uses the fast-thinking component as a subroutine together with working memory and an organized thinking scheme.
The paper has many more examples but suffice to say that while we are still far from AGI, GPT-4 does show that there is a concerted effort to start moving rapidly towards the same. The authors having pointed out reasons for the model's less than optimal performance in dealing with specific classes of problems, propose directions for future research. In my opinion it is only a matter of time, not if, we move towards Artificial General Intelligence.
Total Rewards Professional
1 年For tasks where LLMs are inherently at disadvantage (Eg: making music, scientific discoveries, complex maths etc), do you think integrating Narrow AI like IBM alpha fold with LLMs maybe be a path ahead? Meaning LLMs using narrow AI as to do specialised tasks.