Are we on the way to Artificial General Intelligence (AGI)?

Are we on the way to Artificial General Intelligence (AGI)?

A recent paper (157 pages long) titled "Sparks of Artificial General Intelligence: Early experiments with GPT-4 ", published by Microsoft Research makes for very interesting reading. I have gone through the entire report and it IS TL;DR so I shall try and summarize the findings here for you.

But before that, let's take a brief detour into what intelligence actually means. One of the definitions is that intelligence is a multifaceted and complex cognitive ability that involves the capacity to understand, learn, reason, solve problems, adapt to new situations, think critically, and apply knowledge to different domains.

But what does it mean to say that an artificial intelligence system is intelligent?

From the paper:



In the late-1990s and into the 2000s, there were increasing calls for developing more general AI systems and scholarship in the field has sought to identify principles that might underly more generally intelligent systems . The phrase, “artificial general intelligence” (AGI), was popularized in the early-2000s to emphasize the aspiration of moving from the “narrow AI”, as demonstrated in the focused, real-world applications being developed, to broader notions of intelligence, harkening back to the long-term aspirations and dreams of earlier AI research. We use AGI to refer to systems that demonstrate broad capabilities of intelligence, including reasoning, planning, and the ability to learn from experience, and with these capabilities at or above human-level.??

There are multiple other definitions of AGI:

  1. Goal-oriented - Measures an agent's ability to achieve goals in a wide range of environments. However, it does not take into account systems that can carry out complex tasks without being motivated by a goal.
  2. Skill-acquisition efficiency: Emphasises learning from experience.
  3. Do anything a human can: However, there is no single, standard definition of human intelligence given the wide diversity.

In the paper, the authors propose an approach to study whether GPT-4 is making progress towards AGI by sticking closer to traditional psychology. With that aim, they generated tasks and questions that were meant to push GPT-4 beyond mere memorization and looked at testing GPT-4 across the following areas:

  1. Its mastery of natural language by asking it to translate not just between languages but across tone, content, style and domain and to observe if it can manipulate complex concepts. Spoiler alert: It can!
  2. Coding and Mathematics and its ability across tests
  3. Testing its ability to plan as well as to learn from experience by having it play games and interact with tools.
  4. Testing whether it can understand humans and making itself understandable by humans, i.e addressing the problem of explainability.

Having done this, as the authors say



Can one reasonably say that a system that passes exams for software engineering candidates is not really intelligent? Perhaps the only real test of understanding is whether one can produce new knowledge, such as proving new mathematical theorems, a feat that currently remains out of reach for LLMs.?



Some interesting findings

A key measure of intelligence is the ability to synthesize information from different domains or modalities and the capacity to apply knowledge and skills across different contexts or disciplines.??

GPT-4 did some remarkable things like:

No alt text provided for this image

  • Producing javascript code which generates random images in the style of the painter Kandinsky


No alt text provided for this image

  • Find a proof that there are infinitely many prime numbers in the literary style of Shakespeare
  • Combining knowledge of history and physics by asking it to write a supporting letter for Electron as a US presidential candidate, written by Mahatma Gandhi and addressed to his wife?
  • Produce python code for a program that takes as an input a patient’s age, sex, weight, height and blood test results vector and indicates if the person is at increased risk for diabetes







There are numerous other examples of the model being able to combine diverse disciplines to come up with a seemingly bewildering welter of complexity.

So where there areas where it faltered?

Yes, most certainly. For example, with music. Read for yourself from the paper:


When instructed to generate a short tune (Figure 2.9), and the model was able to produce valid ABC notation. The tune had a clear structure, the time signature was consistent between bars and the notes followed increasing and decreasing patterns. The tune also used a consistent set of notes within the melody, and the rhythm had a repetitive pattern. However, the model did not seem to obtain the skill of understanding harmony. In fact, consecutive notes in the generated tunes are almost always adjacent to each other (namely, the note following C will almost typically be either B or D), and testing on 10 generated tunes, we were not able to extract any clear chords or arpeggios.
We then asked the model to describe the tune in musical terms. It was able to successfully give a technical description of the structure in terms of repetitions, the rising or descending parts of the melody and to some extent the rhythm. However, it seems that the descriptions of the harmony and chords are not consistent with the notes (in fact, it refers to sequences of adjacent notes, which do not form valid chords, as arpeggios).

In short, the model failed at any non-trivial form of harmony. As an amateur musician, I find this VERY interesting. Does harmony actually require a higher order of intelligence?

GPT-4 was able to carry out coding tasks fairly well. Not just at coming up with code, but also in understanding existing pieces of code and reverse-engineering assembly code.

However the model failed at more complicated mathematical problems, since, being a language model, it is context-dependent. The authors look at three aspects of mathematical understanding, viz.,

  • Creative Reasoning - The ability to identify which arguments, intermediate steps, calculations or algebraic manipulations are likely to be relevant at each stage, in order to chart a path towards the solution.?- In this, the model demonstrates a high level of ability in choosing the right argument or path towards the solution.
  • Technical Proficiency - The ability to perform routine calculations or manipulations that follow a prescribed set of steps. The model falters quite often when doing this making simple arithmetic mistakes.
  • Critical Reasoning - The ability to critically examine each step of the argument, break it down into its sub-components, explain what it entails, how it is related to the rest of the argument and why it is correct.?The model demonstrates a significant deficiency in this aspect.

What explains the limitations of GPT-4?

The authors attribute it to two main issues:

  1. "Naive" attention mistakes (remember that attention, that wonderful element of the GPT models that reinforces the learning on a particular aspect in its context) but which are also now acting as limitations and
  2. Its "linear thinking" as a next-token prediction machine

Since LLMs are based on the next-character or next-word prediction paradigms, their inherent limitations also impact on their generalizability to AGI.

From the paper:

These manifest as the model's lack of planning, working memory, ability to backtrack and reasoning abilities. The model relies on a local and greedy process of generating the next word, without any global or deep understanding of the task or the output. Thus, the model is good at producing fluent and coherent texts, but has limitations with regards to solving complex or creative problems which cannot be approached in a sequential manner. This points to the distinction between two types of intellectual tasks:?
Incremental tasks. These are tasks which can be solved in a gradual or continuous way, by adding one word or sentence at a time that constitutes progress in the direction of the solution. Those tasks can be solved via content generation which does not require any major conceptual shifts or insights, but rather relies on applying existing knowledge and skills to the given topic or problem. Examples of incremental tasks are writing a summary of a text, answering factual questions, composing a poem based on a given rhyme scheme, or solving a math problem that follows a standard procedure.?
Discontinuous tasks. These are tasks where the content generation cannot be done in a gradual or continuous way, but instead requires a certain ”Eureka” idea that accounts for a discontinuous leap in the progress towards the solution of the task. The content generation involves discovering or inventing a new way of looking at or framing the problem, that enables the generation of the rest of the content. Examples of discontinuous tasks are solving a math problem that requires a novel or creative application of a formula, writing a joke or a riddle, coming up with a scientific hypothesis or a philosophical argument, or creating a new genre or style of writing.?

Interestingly, the authors borrow from Nobel Laureate Daniel Kahneman's seminal work on the fast and slow modes of thinking. Read for yourself as to what they say:

One possible way to interpret these limitations is to draw an analogy between the model and the concepts of fast and slow thinking, as proposed by Kahneman. Fast thinking is a mode of thinking that is automatic, intuitive, and effortless, but also prone to errors and biases. Slow thinking is a mode of thinking that is controlled, rational, and effortful, but also more accurate and reliable. Kahneman argues that human cognition is a mixture of these two modes of thinking, and that we often rely on fast thinking when we should use slow thinking, or vice versa. The model can be seen as able to perform “fast thinking” operations to a very impressive extent, but is missing the “slow thinking” component which oversees the thought process, uses the fast-thinking component as a subroutine together with working memory and an organized thinking scheme.

The paper has many more examples but suffice to say that while we are still far from AGI, GPT-4 does show that there is a concerted effort to start moving rapidly towards the same. The authors having pointed out reasons for the model's less than optimal performance in dealing with specific classes of problems, propose directions for future research. In my opinion it is only a matter of time, not if, we move towards Artificial General Intelligence.

Parijat Gaur

Total Rewards Professional

1 年

For tasks where LLMs are inherently at disadvantage (Eg: making music, scientific discoveries, complex maths etc), do you think integrating Narrow AI like IBM alpha fold with LLMs maybe be a path ahead? Meaning LLMs using narrow AI as to do specialised tasks.

回复

要查看或添加评论,请登录

Arun Krishnan的更多文章

  • BertViz - Visualizing Attention in Transformers

    BertViz - Visualizing Attention in Transformers

    With the increasing use of LLMs and Transformers in organisations, users are starting to demand explainability from…

  • Buffer-of-Thought Prompting

    Buffer-of-Thought Prompting

    With use cases becoming more and more complicated and agent-based systems becoming the norm for #GenerativeAI based…

    1 条评论
  • To Embed or not to Embed ...

    To Embed or not to Embed ...

    Everyone by now, ought to be familiar with the Retrieval-Augmented Generation (RAG) approach, wherein documents or text…

  • The GenAI conundrum

    The GenAI conundrum

    So you are the CEO of a company and have heard of this wonderful new toy called Generative AI. You call a meeting of…

    9 条评论
  • Understanding the craft of writing

    Understanding the craft of writing

    I have never written an article about writing. Even though I have published my first novel and three more are already…

  • Generating Images with Large Language Model (GILL)

    Generating Images with Large Language Model (GILL)

    By now, we all know that LLMs work by creating embeddings of sentences in a large, multi-dimensional textual space…

    2 条评论
  • Are neural networks actually starting to replicate the functioning of the human brain?

    Are neural networks actually starting to replicate the functioning of the human brain?

    Artificial Neural Networks (ANNs), as the name suggests were patterned after the way we thought the human brain worked.…

    2 条评论
  • Claude and "Constitutional" AI

    Claude and "Constitutional" AI

    For a while now, I have been of the firm opinion that we need to build in Asimov's Three Laws of Robotics into our AI…

  • All about Chain-of-Thought (CoT)Prompting

    All about Chain-of-Thought (CoT)Prompting

    The rapidity with which LLM models have been progressing has been nothing short of stunning. The last few months have…

    5 条评论
  • And you thought GPT4.0 was the cat's whiskers? Try Code Interpreter!

    And you thought GPT4.0 was the cat's whiskers? Try Code Interpreter!

    Here I come again. With yet another post on GPT.

    2 条评论

社区洞察

其他会员也浏览了