登录查看更多内容

Artificial general intelligence, are we there yet?

Michael Azoff

发布日期: 2024年5月15日

Introduction

The current state of the art in artificial intelligence (AI) is generative AI and large language models (LLMs). The emergent capabilities of these models have been surprising: they are able to perform logical reasoning, complete mathematical proofs, generate code for software developers, and not least converse with humans in human-like ways. A natural question is how close these models are to artificial general intelligence (AGI), the term used to describe human-level intelligent capabilities. This article explores the answer and the implications of current and future AI capabilities for IT professionals.

Understanding in LLM

The first impression of LLMs was that they were grand statistical analysis models that used fine probabilities to produce the next word in a sequence. The experts building LLMs create novel architectures and refine performance with advanced training algorithms, but under the hood it is a black box: artificial neurons connected to each other, attenuated by line strengths; what exactly goes on between the neurons is an unknown.

However, we do understand that as signals pass from one layer to another in an LLM model, an abstraction process takes place that leads to higher concepts being captured. This suggests that LLMs make sense of language conceptually, and concepts contain meaning. This is a shallow level of understanding an LLMs possesses as it does not have the apparatus of the brain to develop deeper understanding, but sufficient to perform simple reasoning.

Omdia is seeing AI researchers treating LLMs as experimental subjects and running various benchmarks and tests on these models to see how well they can perform. To test the logical reasoning of OpenAI’s ChatGPT I ran the following query: The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true? The correct answer as I’m sure you worked out: Yes, it can be true, the doctor and mother could be the same person.

In what follows I gave a shortened version of ChatGPT’s responses (in bold), the actual wording was quite long winded. The free version of ChatGPT is based on GPT-3.5 and its initial response was: In a figurative or metaphorical sense, yes, it can be true. It then went on to say the son could be expressing gratitude…to the doctor…provided medical care and while not literally true.

ChaGPT using the latest GPT-4 requires a small monthly premium, which in the interests of science I paid up. This was the response: The statement presents a mix of literal and metaphorical interpretations of "giving birth." And: both statements can be true, depending on how the phrase "gave birth" is understood.

There is clearly an issue of metaphors here, so I added an initial prompt to the query: Treat the following statements in purely logical terms and not metaphor. The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true?

The response from ChatGPT (based on GPT-4) was: they cannot both be true simultaneously because they contradict each other regarding who actually gave birth to the son. Not a good response.

I added one more prompt at the query end to help guide the answer: Treat the following statements in purely logical terms and not metaphor. The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true? In answering consider who the doctor could in theory be.

ChatGPT (GPT-4) finally gave the correct answer:? …if the mother of the son is herself a doctor … then both statements could technically be true. However, ChatGPT (GPT-3.5) was still stuck: In purely logical terms, the statements given are contradictory.

To conclude on this exercise, ChatGPT (GPT-4) can perform logical reasoning but needs prompts to guide it. It will be interesting to see how GPT-5 performs when it is launched in mid-2024. My guess is that at some point in the evolution of GPT it will be able to answer this query correctly without the second prompt, whereas the first prompt is a reasonable one to give to ensure the machine understands the nature of the query.

Whichever way you analyze this exercise, what impresses me is that GPT was not trained to perform logical reasoning, it was trained to process language.

LLM: hype or substance

If you read the press there is a sense, at least by some commentators, that we’re in a bubble, however Omdia’s view is that the bubble may be related to the stock market valuations of certain players in the market who make current LLM models possible. Clearly companies come and go and this is not the place to give stock picking recommendations. There probably will be churn in which players sit at the top but what will endure is a thread of continually improving AI technology of the generative kind. This has substance and will have a lasting impact, not least in our everyday work experience, as intelligent machines augment and assist people in their jobs. There will no doubt be some job displacement, as some jobs disappear through automation, others will open that require a human in the loop. A major step change in how we use this technology will be LLM on the edge.

LLM on the edge

LLM models tend to be rather large, with billions of parameters, and need significant GPU processing capabilities to train them. The parameters refer to variables known as weights that connect artificial neurons in the model and attenuate the connection strength between connected neurons. Each neuron also has a ‘bias’ parameter. The best way to think about parameters is as a proxy for the number of artificial neurons in the model. The more parameters, the bigger the artificial brain.

There is a trend that the larger the model, the better its performance on various benchmarks. This is true of OpenAI’s GPT models. However, some players in the market have resorted to techniques that keep the size of the model stable while finding algorithmic techniques to increase performance. Exploiting sparsity is one approach. For example, many neurons move very small data values (near to zero) in any given process/calculation and contribute little to the outcome. Dynamic sparsity is a technique that ignores such neurons and thereby only a subset of neurons in any given process take part in the outcome and this reduces the size of the model. An example of this technique is used by ThirdAI on its Bolt2.5B LLM.

The key benefit of a smaller LLM is the ability to put it on the edge: in your smartphone, in an automobile, on the factory floor, etc. The are clear benefits for LLM on the edge:

·??????? Lower cost of training smaller models.

·??????? Reduce the roundtrip latency in interrogating the LLM.

·??????? Maintaining privacy of data, keeping it local.

The following players are working on small LLM models and have published their Massive Multitask Language Understanding (MMLU) benchmark score – see Figure 1.

·??????? Alibaba: Qwen, open source models.

·??????? Google DeepMind: recently released Gemma lightweight LLM models based on Gemini.

领英推荐

Hallucination of Artificial Intelligence algorithms…

Radhika Gopinatha dasa 2 个月前

Chat GPT4 says self awareness is the likely outcome...

Kimberley Asher 2 年前

Intellect vs. Intelligence - The War Between Humans…

Keyanoush Razavidinani 1 年前

·??????? Meta: Llama 3 is the latest model, available in different sizes.

·??????? Microsoft: Phi-3 series, the latest in the Phi models.

·??????? Mistral: French based startup.

·??????? OpenAI: GPT, huge LLMs but referred to here for reference.

Figure 1: Range of relatively small LLM models with their MMLU benchmark scores. OpenAI GPT included for reference. MMLU above 75% are highlighted. Source: Omdia.

The data in the table is plotted on a log scale in Figure 2. Clearly the larger the model the better the MMLU score, of the small models noteworthy is Microsoft Phi-3-medium (14bn parameters) which scores a high MMLU of 78. Microsoft is aiming to put a model like this in a smartphone. The day is nearing when phone users will be able to converse with a personal assistant to make appointments, make calls, take notes, as well as converse on personal matters such as health, and keep all the data confidential and private. This will open a huge market. The emergent capability of logical reasoning will make a significant difference in autonomous driving, where the machine will be able to reason in scenarios it has not been trained on.

Figure 2: Range of relatively small LLM models with their MMLU benchmark score. OpenAI GPT included for reference. Note the log scale on y-axis. Source: Omdia.

AI implications for IT professionals

Emergent properties of generative AI models based on reasoning are the most powerful features to make these models valuable in everyday work. There are multiple types of reasoning :

·??????? Logical

·??????? Analogical

·??????? Social

·??????? Visual

·??????? Implicit

·??????? Causal

·??????? Common sense

We would also want the AI models to perform deductive (reason based on given facts), inductive (be able to generalize) and abductive (identify the best explanation) reasoning. When LLMs can perform the above types of reasoning in a reliable way then we will have reached an important milestone on the path to AGI.

With the current LLM capabilities they can augment people in their work and improve their productivity. Generate test cases from a set of requirements? That could be a three-hour job for a developer, it would take an LLM 3 mins. It would likely be incomplete and may contain some poor choices, but also create tests the developer would not have thought of. It would kick-start the process and save the developer time.

LLM models can be fine-tuned with private data, for example the infrastructure details of an organization would be unique to that organization. Such an LLM fine-tuned to be queried on internal IT matters would be able to provide custom and reliable information relevant to that organization.

AI based machine assistants will become normal in the workplace. Fine tuned models can act as a source of knowledge, especially helpful for new workers. In future the AI machines will be able to rapidly perform triage and be reliable enough to take remediation action. As a reliable assistant my view is this technology will be embraced by IT professionals to improve their productivity.

Appendix

Michael Azoff的更多文章

Highlights of Jensen Huang's keynote at #Nvidia GTC 2025

2025年3月21日

Highlights of Jensen Huang's keynote at #Nvidia GTC 2025

Jensen talked about the trajectory of AI, from current generative AI, to agentic AI that many players are actively…
What the customer wanted, AI redux!

2025年1月22日

What the customer wanted, AI redux!

What the customer wanted – the old way Some decades ago a cartoon went round the IT community showing “what the…
AWS re:Invent 2024: Developer-focused event announced new AWS LLMs and latest accelerator chips

2025年1月9日

AWS re:Invent 2024: Developer-focused event announced new AWS LLMs and latest accelerator chips

Summary #AWS #re:Invent 2024, held in Las Vegas from December 2 to 6, was a feast for #Amazon Web Services (AWS) cloud…
Prediction for when artificial general intelligence will happen

2024年11月28日

Prediction for when artificial general intelligence will happen

This is my prediction for when I think artificial general intelligence (AGI) will happen. #AGI is human-level #AI.
FinOps X Barcelona 2024: AI for FinOps/FinOps for AI, and other highlights

2024年11月28日

FinOps X Barcelona 2024: AI for FinOps/FinOps for AI, and other highlights

Omdia view FinOps Foundation were kind enough to invite me to attend their FinOps X EU event in Barcelona earlier this…
Bad designs around us that persist

2024年7月29日

Bad designs around us that persist

Driver blind spot I used to ride a motorcycle and was trained to glance over my shoulder when overtaking, it’s called…

2 条评论
Ladder of increasingly intelligent systems

2024年3月4日

Ladder of increasingly intelligent systems

Introduction The aim for many artificial intelligence (AI) researchers is to build intelligent machines that achieve…
Omdia Market Radar: AI-Assisted Software Development, 2023–24 – an extract

2024年1月9日

Omdia Market Radar: AI-Assisted Software Development, 2023–24 – an extract

Summary Catalyst In the space of one year since the launch of ChatGPT in November 2022, the market and appetite for…
OpenUK reveals that the UK is a world leader in open source software

2023年8月17日

OpenUK reveals that the UK is a world leader in open source software

OpenUK is a not-for-profit company founded in 2018 to support open technology within the UK. By open technology, the…

1 条评论
Wing Cloud claims first programming language for the cloud

2023年8月17日

Wing Cloud claims first programming language for the cloud

For a new programming language to enter the market successfully, it must fulfill a need that other languages do not…

See all articles

Artificial general intelligence, are we there yet?

Michael Azoff

Introduction

Understanding in LLM

LLM: hype or substance

LLM on the edge

领英推荐

AI implications for IT professionals

Appendix

Further Reading

Michael Azoff的更多文章

社区洞察

其他会员也浏览了

Intellect vs. Intelligence - The War Between Humans and AI

AI Shouldn't Make You Anxious

UNDERSTANDING HOW ARTIFICIAL INTELLIGENCE ‘THINKS’

A Comprehensive Deep Dive into Large Language Models (LLMs): From Training to Practical Applications

I'm Going to Start Referring to "AI" as "CI." Here's Why.

All LLMs Are Not Created Equal: Understanding the Different Types and Their Impact on Outputs

Unveiling Transformers: The Fascinating Architecture Shaping the Future

Geneea's AI Spotlight #8

Acid Trips, Creativity and LLM-Chats

You Won't Believe How Smart AI Is Getting! Mindblowing Tales from the Front Lines

Introduction

Understanding in LLM

LLM: hype or substance

LLM on the edge

领英推荐

AI implications for IT professionals

Appendix

Further Reading

Michael Azoff的更多文章

Highlights of Jensen Huang's keynote at #Nvidia GTC 2025

What the customer wanted, AI redux!

AWS re:Invent 2024: Developer-focused event announced new AWS LLMs and latest accelerator chips

Prediction for when artificial general intelligence will happen

FinOps X Barcelona 2024: AI for FinOps/FinOps for AI, and other highlights

Bad designs around us that persist

Ladder of increasingly intelligent systems

Omdia Market Radar: AI-Assisted Software Development, 2023–24 – an extract

OpenUK reveals that the UK is a world leader in open source software

Wing Cloud claims first programming language for the cloud

社区洞察

其他会员也浏览了

Intellect vs. Intelligence - The War Between Humans and AI

AI Shouldn't Make You Anxious

UNDERSTANDING HOW ARTIFICIAL INTELLIGENCE ‘THINKS’

A Comprehensive Deep Dive into Large Language Models (LLMs): From Training to Practical Applications

I'm Going to Start Referring to "AI" as "CI." Here's Why.

All LLMs Are Not Created Equal: Understanding the Different Types and Their Impact on Outputs

Unveiling Transformers: The Fascinating Architecture Shaping the Future

Geneea's AI Spotlight #8

Acid Trips, Creativity and LLM-Chats

You Won't Believe How Smart AI Is Getting! Mindblowing Tales from the Front Lines