Artificial general intelligence, are we there yet?
Introduction
The current state of the art in artificial intelligence (AI) is generative AI and large language models (LLMs). The emergent capabilities of these models have been surprising: they are able to perform logical reasoning, complete mathematical proofs, generate code for software developers, and not least converse with humans in human-like ways. A natural question is how close these models are to artificial general intelligence (AGI), the term used to describe human-level intelligent capabilities. This article explores the answer and the implications of current and future AI capabilities for IT professionals.
Understanding in LLM
The first impression of LLMs was that they were grand statistical analysis models that used fine probabilities to produce the next word in a sequence. The experts building LLMs create novel architectures and refine performance with advanced training algorithms, but under the hood it is a black box: artificial neurons connected to each other, attenuated by line strengths; what exactly goes on between the neurons is an unknown.
However, we do understand that as signals pass from one layer to another in an LLM model, an abstraction process takes place that leads to higher concepts being captured. This suggests that LLMs make sense of language conceptually, and concepts contain meaning. This is a shallow level of understanding an LLMs possesses as it does not have the apparatus of the brain to develop deeper understanding, but sufficient to perform simple reasoning.
Omdia is seeing AI researchers treating LLMs as experimental subjects and running various benchmarks and tests on these models to see how well they can perform. To test the logical reasoning of OpenAI’s ChatGPT I ran the following query: The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true? The correct answer as I’m sure you worked out: Yes, it can be true, the doctor and mother could be the same person.
In what follows I gave a shortened version of ChatGPT’s responses (in bold), the actual wording was quite long winded. The free version of ChatGPT is based on GPT-3.5 and its initial response was: In a figurative or metaphorical sense, yes, it can be true. It then went on to say the son could be expressing gratitude…to the doctor…provided medical care and while not literally true.
ChaGPT using the latest GPT-4 requires a small monthly premium, which in the interests of science I paid up. This was the response: The statement presents a mix of literal and metaphorical interpretations of "giving birth." And: both statements can be true, depending on how the phrase "gave birth" is understood.
There is clearly an issue of metaphors here, so I added an initial prompt to the query: Treat the following statements in purely logical terms and not metaphor. The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true?
The response from ChatGPT (based on GPT-4) was: they cannot both be true simultaneously because they contradict each other regarding who actually gave birth to the son. Not a good response.
I added one more prompt at the query end to help guide the answer: Treat the following statements in purely logical terms and not metaphor. The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true? In answering consider who the doctor could in theory be.
ChatGPT (GPT-4) finally gave the correct answer:? …if the mother of the son is herself a doctor … then both statements could technically be true. However, ChatGPT (GPT-3.5) was still stuck: In purely logical terms, the statements given are contradictory.
To conclude on this exercise, ChatGPT (GPT-4) can perform logical reasoning but needs prompts to guide it. It will be interesting to see how GPT-5 performs when it is launched in mid-2024. My guess is that at some point in the evolution of GPT it will be able to answer this query correctly without the second prompt, whereas the first prompt is a reasonable one to give to ensure the machine understands the nature of the query.
Whichever way you analyze this exercise, what impresses me is that GPT was not trained to perform logical reasoning, it was trained to process language.
LLM: hype or substance
If you read the press there is a sense, at least by some commentators, that we’re in a bubble, however Omdia’s view is that the bubble may be related to the stock market valuations of certain players in the market who make current LLM models possible. Clearly companies come and go and this is not the place to give stock picking recommendations. There probably will be churn in which players sit at the top but what will endure is a thread of continually improving AI technology of the generative kind. This has substance and will have a lasting impact, not least in our everyday work experience, as intelligent machines augment and assist people in their jobs. There will no doubt be some job displacement, as some jobs disappear through automation, others will open that require a human in the loop. A major step change in how we use this technology will be LLM on the edge.
LLM on the edge
LLM models tend to be rather large, with billions of parameters, and need significant GPU processing capabilities to train them. The parameters refer to variables known as weights that connect artificial neurons in the model and attenuate the connection strength between connected neurons. Each neuron also has a ‘bias’ parameter. The best way to think about parameters is as a proxy for the number of artificial neurons in the model. The more parameters, the bigger the artificial brain.
There is a trend that the larger the model, the better its performance on various benchmarks. This is true of OpenAI’s GPT models. However, some players in the market have resorted to techniques that keep the size of the model stable while finding algorithmic techniques to increase performance. Exploiting sparsity is one approach. For example, many neurons move very small data values (near to zero) in any given process/calculation and contribute little to the outcome. Dynamic sparsity is a technique that ignores such neurons and thereby only a subset of neurons in any given process take part in the outcome and this reduces the size of the model. An example of this technique is used by ThirdAI on its Bolt2.5B LLM.
The key benefit of a smaller LLM is the ability to put it on the edge: in your smartphone, in an automobile, on the factory floor, etc. The are clear benefits for LLM on the edge:
·??????? Lower cost of training smaller models.
·??????? Reduce the roundtrip latency in interrogating the LLM.
·??????? Maintaining privacy of data, keeping it local.
The following players are working on small LLM models and have published their Massive Multitask Language Understanding (MMLU) benchmark score – see Figure 1.
·??????? Alibaba: Qwen, open source models.
·??????? Google DeepMind: recently released Gemma lightweight LLM models based on Gemini.
领英推荐
·??????? Meta: Llama 3 is the latest model, available in different sizes.
·??????? Microsoft: Phi-3 series, the latest in the Phi models.
·??????? Mistral: French based startup.
·??????? OpenAI: GPT, huge LLMs but referred to here for reference.
The data in the table is plotted on a log scale in Figure 2. Clearly the larger the model the better the MMLU score, of the small models noteworthy is Microsoft Phi-3-medium (14bn parameters) which scores a high MMLU of 78. Microsoft is aiming to put a model like this in a smartphone. The day is nearing when phone users will be able to converse with a personal assistant to make appointments, make calls, take notes, as well as converse on personal matters such as health, and keep all the data confidential and private. This will open a huge market. The emergent capability of logical reasoning will make a significant difference in autonomous driving, where the machine will be able to reason in scenarios it has not been trained on.
AI implications for IT professionals
Emergent properties of generative AI models based on reasoning are the most powerful features to make these models valuable in everyday work. There are multiple types of reasoning :
·??????? Logical
·??????? Analogical
·??????? Social
·??????? Visual
·??????? Implicit
·??????? Causal
·??????? Common sense
We would also want the AI models to perform deductive (reason based on given facts), inductive (be able to generalize) and abductive (identify the best explanation) reasoning. When LLMs can perform the above types of reasoning in a reliable way then we will have reached an important milestone on the path to AGI.
With the current LLM capabilities they can augment people in their work and improve their productivity. Generate test cases from a set of requirements? That could be a three-hour job for a developer, it would take an LLM 3 mins. It would likely be incomplete and may contain some poor choices, but also create tests the developer would not have thought of. It would kick-start the process and save the developer time.
LLM models can be fine-tuned with private data, for example the infrastructure details of an organization would be unique to that organization. Such an LLM fine-tuned to be queried on internal IT matters would be able to provide custom and reliable information relevant to that organization.
AI based machine assistants will become normal in the workplace. Fine tuned models can act as a source of knowledge, especially helpful for new workers. In future the AI machines will be able to rapidly perform triage and be reliable enough to take remediation action. As a reliable assistant my view is this technology will be embraced by IT professionals to improve their productivity.
Appendix
Further Reading
Omdia Market Radar: AI Assisted Software Development, 2023-24, Omdia OM120509.
E M Azoff, Towards Human-Level Artificial Intelligence, CRC Press, to appear 2024.
Note: this article first appeared in IT Pro: https://www.itprotoday.com/artificial-intelligence/artificial-general-intelligence-are-we-there-yet