登录查看更多内容

Are we on the way to Artificial General Intelligence (AGI)?

Arun Krishnan

Entrepreneur, Technology Leader, Business Leader, Analytics, AI, GenAI, Author Experienced Data Science and AI professional and leader. Driving Data Science, AI and GenAI technology and business growth

发布日期: 2023年4月24日

A recent paper (157 pages long) titled "Sparks of Artificial General Intelligence: Early experiments with GPT-4 ", published by Microsoft Research makes for very interesting reading. I have gone through the entire report and it IS TL;DR so I shall try and summarize the findings here for you.

But before that, let's take a brief detour into what intelligence actually means. One of the definitions is that intelligence is a multifaceted and complex cognitive ability that involves the capacity to understand, learn, reason, solve problems, adapt to new situations, think critically, and apply knowledge to different domains.

But what does it mean to say that an artificial intelligence system is intelligent?

From the paper:

In the late-1990s and into the 2000s, there were increasing calls for developing more general AI systems and scholarship in the field has sought to identify principles that might underly more generally intelligent systems . The phrase, “artificial general intelligence” (AGI), was popularized in the early-2000s to emphasize the aspiration of moving from the “narrow AI”, as demonstrated in the focused, real-world applications being developed, to broader notions of intelligence, harkening back to the long-term aspirations and dreams of earlier AI research. We use AGI to refer to systems that demonstrate broad capabilities of intelligence, including reasoning, planning, and the ability to learn from experience, and with these capabilities at or above human-level.??

There are multiple other definitions of AGI:

Goal-oriented - Measures an agent's ability to achieve goals in a wide range of environments. However, it does not take into account systems that can carry out complex tasks without being motivated by a goal.
Skill-acquisition efficiency: Emphasises learning from experience.
Do anything a human can: However, there is no single, standard definition of human intelligence given the wide diversity.

In the paper, the authors propose an approach to study whether GPT-4 is making progress towards AGI by sticking closer to traditional psychology. With that aim, they generated tasks and questions that were meant to push GPT-4 beyond mere memorization and looked at testing GPT-4 across the following areas:

Its mastery of natural language by asking it to translate not just between languages but across tone, content, style and domain and to observe if it can manipulate complex concepts. Spoiler alert: It can!
Coding and Mathematics and its ability across tests
Testing its ability to plan as well as to learn from experience by having it play games and interact with tools.
Testing whether it can understand humans and making itself understandable by humans, i.e addressing the problem of explainability.

Having done this, as the authors say

Can one reasonably say that a system that passes exams for software engineering candidates is not really intelligent? Perhaps the only real test of understanding is whether one can produce new knowledge, such as proving new mathematical theorems, a feat that currently remains out of reach for LLMs.?

Some interesting findings

A key measure of intelligence is the ability to synthesize information from different domains or modalities and the capacity to apply knowledge and skills across different contexts or disciplines.??

GPT-4 did some remarkable things like:

Producing javascript code which generates random images in the style of the painter Kandinsky

Find a proof that there are infinitely many prime numbers in the literary style of Shakespeare
Combining knowledge of history and physics by asking it to write a supporting letter for Electron as a US presidential candidate, written by Mahatma Gandhi and addressed to his wife?
Produce python code for a program that takes as an input a patient’s age, sex, weight, height and blood test results vector and indicates if the person is at increased risk for diabetes

Ariel Arrieta 9 个月前

Agent Chaos: How AI Models Are Spiraling into Collapse

Ganesh Raju 2 个月前

The Ultimate Guide to Artificial Intelligence.…

Nikhil Singh 4 年前

There are numerous other examples of the model being able to combine diverse disciplines to come up with a seemingly bewildering welter of complexity.

So where there areas where it faltered?

Yes, most certainly. For example, with music. Read for yourself from the paper:

When instructed to generate a short tune (Figure 2.9), and the model was able to produce valid ABC notation. The tune had a clear structure, the time signature was consistent between bars and the notes followed increasing and decreasing patterns. The tune also used a consistent set of notes within the melody, and the rhythm had a repetitive pattern. However, the model did not seem to obtain the skill of understanding harmony. In fact, consecutive notes in the generated tunes are almost always adjacent to each other (namely, the note following C will almost typically be either B or D), and testing on 10 generated tunes, we were not able to extract any clear chords or arpeggios.

We then asked the model to describe the tune in musical terms. It was able to successfully give a technical description of the structure in terms of repetitions, the rising or descending parts of the melody and to some extent the rhythm. However, it seems that the descriptions of the harmony and chords are not consistent with the notes (in fact, it refers to sequences of adjacent notes, which do not form valid chords, as arpeggios).

In short, the model failed at any non-trivial form of harmony. As an amateur musician, I find this VERY interesting. Does harmony actually require a higher order of intelligence?

GPT-4 was able to carry out coding tasks fairly well. Not just at coming up with code, but also in understanding existing pieces of code and reverse-engineering assembly code.

However the model failed at more complicated mathematical problems, since, being a language model, it is context-dependent. The authors look at three aspects of mathematical understanding, viz.,

Creative Reasoning - The ability to identify which arguments, intermediate steps, calculations or algebraic manipulations are likely to be relevant at each stage, in order to chart a path towards the solution.?- In this, the model demonstrates a high level of ability in choosing the right argument or path towards the solution.
Technical Proficiency - The ability to perform routine calculations or manipulations that follow a prescribed set of steps. The model falters quite often when doing this making simple arithmetic mistakes.
Critical Reasoning - The ability to critically examine each step of the argument, break it down into its sub-components, explain what it entails, how it is related to the rest of the argument and why it is correct.?The model demonstrates a significant deficiency in this aspect.

What explains the limitations of GPT-4?

The authors attribute it to two main issues:

"Naive" attention mistakes (remember that attention, that wonderful element of the GPT models that reinforces the learning on a particular aspect in its context) but which are also now acting as limitations and
Its "linear thinking" as a next-token prediction machine

Since LLMs are based on the next-character or next-word prediction paradigms, their inherent limitations also impact on their generalizability to AGI.

From the paper:

These manifest as the model's lack of planning, working memory, ability to backtrack and reasoning abilities. The model relies on a local and greedy process of generating the next word, without any global or deep understanding of the task or the output. Thus, the model is good at producing fluent and coherent texts, but has limitations with regards to solving complex or creative problems which cannot be approached in a sequential manner. This points to the distinction between two types of intellectual tasks:?

Incremental tasks. These are tasks which can be solved in a gradual or continuous way, by adding one word or sentence at a time that constitutes progress in the direction of the solution. Those tasks can be solved via content generation which does not require any major conceptual shifts or insights, but rather relies on applying existing knowledge and skills to the given topic or problem. Examples of incremental tasks are writing a summary of a text, answering factual questions, composing a poem based on a given rhyme scheme, or solving a math problem that follows a standard procedure.?

Discontinuous tasks. These are tasks where the content generation cannot be done in a gradual or continuous way, but instead requires a certain ”Eureka” idea that accounts for a discontinuous leap in the progress towards the solution of the task. The content generation involves discovering or inventing a new way of looking at or framing the problem, that enables the generation of the rest of the content. Examples of discontinuous tasks are solving a math problem that requires a novel or creative application of a formula, writing a joke or a riddle, coming up with a scientific hypothesis or a philosophical argument, or creating a new genre or style of writing.?

Interestingly, the authors borrow from Nobel Laureate Daniel Kahneman's seminal work on the fast and slow modes of thinking. Read for yourself as to what they say:

One possible way to interpret these limitations is to draw an analogy between the model and the concepts of fast and slow thinking, as proposed by Kahneman. Fast thinking is a mode of thinking that is automatic, intuitive, and effortless, but also prone to errors and biases. Slow thinking is a mode of thinking that is controlled, rational, and effortful, but also more accurate and reliable. Kahneman argues that human cognition is a mixture of these two modes of thinking, and that we often rely on fast thinking when we should use slow thinking, or vice versa. The model can be seen as able to perform “fast thinking” operations to a very impressive extent, but is missing the “slow thinking” component which oversees the thought process, uses the fast-thinking component as a subroutine together with working memory and an organized thinking scheme.

The paper has many more examples but suffice to say that while we are still far from AGI, GPT-4 does show that there is a concerted effort to start moving rapidly towards the same. The authors having pointed out reasons for the model's less than optimal performance in dealing with specific classes of problems, propose directions for future research. In my opinion it is only a matter of time, not if, we move towards Artificial General Intelligence.

Parijat Gaur

Total Rewards Professional

1 年

For tasks where LLMs are inherently at disadvantage (Eg: making music, scientific discoveries, complex maths etc), do you think integrating Narrow AI like IBM alpha fold with LLMs maybe be a path ahead? Meaning LLMs using narrow AI as to do specialised tasks.

要查看或添加评论，请登录

Arun Krishnan的更多文章

BertViz - Visualizing Attention in Transformers

2024年6月25日

BertViz - Visualizing Attention in Transformers

With the increasing use of LLMs and Transformers in organisations, users are starting to demand explainability from…
Buffer-of-Thought Prompting

2024年6月20日

Buffer-of-Thought Prompting

With use cases becoming more and more complicated and agent-based systems becoming the norm for #GenerativeAI based…

1 条评论
To Embed or not to Embed ...

2023年12月12日

To Embed or not to Embed ...

Everyone by now, ought to be familiar with the Retrieval-Augmented Generation (RAG) approach, wherein documents or text…
The GenAI conundrum

2023年11月30日

The GenAI conundrum

So you are the CEO of a company and have heard of this wonderful new toy called Generative AI. You call a meeting of…

9 条评论
Understanding the craft of writing

2023年6月15日

Understanding the craft of writing

I have never written an article about writing. Even though I have published my first novel and three more are already…
Generating Images with Large Language Model (GILL)

2023年6月13日

Generating Images with Large Language Model (GILL)

By now, we all know that LLMs work by creating embeddings of sentences in a large, multi-dimensional textual space…

2 条评论
Are neural networks actually starting to replicate the functioning of the human brain?

2023年5月25日

Are neural networks actually starting to replicate the functioning of the human brain?

Artificial Neural Networks (ANNs), as the name suggests were patterned after the way we thought the human brain worked.…

2 条评论
Claude and "Constitutional" AI

2023年5月23日

Claude and "Constitutional" AI

For a while now, I have been of the firm opinion that we need to build in Asimov's Three Laws of Robotics into our AI…
All about Chain-of-Thought (CoT)Prompting

2023年5月15日

All about Chain-of-Thought (CoT)Prompting

The rapidity with which LLM models have been progressing has been nothing short of stunning. The last few months have…

5 条评论
And you thought GPT4.0 was the cat's whiskers? Try Code Interpreter!

2023年5月4日

And you thought GPT4.0 was the cat's whiskers? Try Code Interpreter!

Here I come again. With yet another post on GPT.

2 条评论

See all articles

Are we on the way to Artificial General Intelligence (AGI)?

Arun Krishnan

Entrepreneur, Technology Leader, Business Leader, Analytics, AI, GenAI, Author Experienced Data Science and AI professional and leader. Driving Data Science, AI and GenAI technology and business growth

Some interesting findings

领英推荐

So where there areas where it faltered?

What explains the limitations of GPT-4?

Arun Krishnan的更多文章

社区洞察

其他会员也浏览了

Introducing the Next Generation in AI: O1 (Code-Named “Strawberry”)

AGI, Simplified!

Artificial Intelligence: Why you can’t afford to ignore it any longer

Exploring the Advanced Variants of Retrieval-Augmented Generation (RAG)

A New Approach to Tokenization

The AI ToolBox #2: Vector Search in Machine Learning and AI

Quis custodiet ipsos artificialis intelligentia

On the Promises and Perils of Artificial Intelligence in Review Research

Introduction of Artificial Intelligence in Information Technology

There is No AI but Man-Machine Intelligence and Learning (MMIL?)

Some interesting findings

领英推荐

So where there areas where it faltered?

What explains the limitations of GPT-4?

Arun Krishnan的更多文章

BertViz - Visualizing Attention in Transformers

Buffer-of-Thought Prompting

To Embed or not to Embed ...

The GenAI conundrum

Understanding the craft of writing

Generating Images with Large Language Model (GILL)

Are neural networks actually starting to replicate the functioning of the human brain?

Claude and "Constitutional" AI

All about Chain-of-Thought (CoT)Prompting

And you thought GPT4.0 was the cat's whiskers? Try Code Interpreter!

社区洞察

其他会员也浏览了

Introducing the Next Generation in AI: O1 (Code-Named “Strawberry”)

AGI, Simplified!

Artificial Intelligence: Why you can’t afford to ignore it any longer

Exploring the Advanced Variants of Retrieval-Augmented Generation (RAG)

A New Approach to Tokenization

The AI ToolBox #2: Vector Search in Machine Learning and AI

Quis custodiet ipsos artificialis intelligentia

On the Promises and Perils of Artificial Intelligence in Review Research

Introduction of Artificial Intelligence in Information Technology

There is No AI but Man-Machine Intelligence and Learning (MMIL?)