Large Language Models as Reasoning Engines: Decoding the Emergent Abilities and Future Prospects
Murugesan Narayanaswamy
From Finance & IT to AI Innovation: Mastering the Future | Deep Learning | NLP | Generative AI
Introduction
Today, I read an article in InfoWorld about Artificial General Intelligence (AGI) and LLMs?titled “Progress in AI requires thinking beyond LLMs’ (https://www.infoworld.com/article/3715062/progress-in-ai-requires-thinking-beyond-llms.html). . The author states, ‘At their core, LLMs are nothing more than sophisticated memorization machines, capable of reasonable-sounding statements, but unable to understand fundamental truth.’ The author concludes that we are far away from creating an AGI and we are unlikely to achieve the same by pushing the frontiers through research on Large Language Models?and that?AGI requires thinking beyond LLMs.
Are the current LLMs nothing more than sophisticated memorization machines? I don’t think so. It is quite surprising for me to note that there are still many who woule like to think that LLMs are nothing more than natural language processing programs that have an embedding model through which they generate next predicted tokens based on statistical means, and through which they generate new content. There is an evident?‘consciousness component’ which they seem to overlook, especially since this consciousness component cannot be explained away through the transformer’s embedding model or multi-head self-attention architecture. This consciousness component may not be the same as the so-called ‘AGI’ but it is definitely the ‘AI’.
LLMs and Emergent Abilities
This phenomenon of ‘emergent behavior alluding to?consciousness’ was very clearly observed at the early stages and researchers rightly termed it as ‘emergent behavior’. It is defined as an ability that is not present in the small models but present in large models. Jason Wei talks about more than 100 emergent abilities that are discovered by scaling language models
The emergent abilities imply that some of the most important capabilities of LLMs like ChatGPTs were?achieved by simply scaling the training of the existing transformer architecture-based models, given the same size of dataset or knowledge model. This does not necessarily refute the possibility of achieving the same through further advanced research in LLM architectures?with much smaller models. For example, the recently released Mistral 7B is supposedly outperforming the much bigger LLM models through the introduction of new mechanisms like Grouped-query Attention (GQA), which allows for faster inference times compared to standard full attention, and Sliding Window Attention (SWA) that gives Mistral 7B the ability to handle longer text sequences at a low cost. By dismissing emergent abilities, we miss out on research in these areas.
Instruction tuning
In one of the short courses on LLMs in deeplearning.ai, the difference between base foundation models and instruction-tuned models was explained. Many believe that instruction tuning is nothing more than training a foundation LLM using input-output pairs of instructions so that it can learn several other tasks. When I was playing around, it was clear to me that instruction tuning is not really creating any consciousness component or additional knowledge. The base foundation model has an ‘emergent consciousness component’ that has assimilated a knowledge model but does not know how to deliver on specific requests?or instructions. The instruction tuning helps it to formulate its consciousness to generate the answers. In other words, the concept of instruction tuning is more properly applicable only for models that are already possessing ‘emergent abilities’. The Reinforcement Learning through Human Feedback
Large Language Models as Reasoning Engines
The functionalities of LLMs can be categorized into two categories:
1. A Natural Language Interface or Reasoning Engine - Through this interface, humans can communicate with computer programs using natural languages and receive created content required format?
2. A Knowledge Store or Searchable Knowledge Base - this contains the internet-based knowledge?based or knowledge contained in the datasets trained on.?
Consider a company that deploys a ChatGPT-like model for its internal usage. A model like GPT-3 has 175 billion parameters while GPT-4 has 1.76 trillion parameters, most of which might be considered knowledge store (which the author in the article quoted in the first paragraph refers to as memorized data). Suppose the company uses a RAG model and most of its functionalities are served through the intranet-based knowledge. In other words, the GPT-4 model might not use any of the internet-based knowledge it has stored in its parameters. All the data it processes is only from the company’s internally generated private information. If this is the case, Do we really require such a large model?of 1.76 trillion parameters, given that the model used by the company does not use any of the inbuilt knowledge store of the GPT-4 model?
This implies a possibility that we can have a lightweight LLM that is used only as a reasoning engine or natural language processing interface component while all the information/knowledge it processes is external to the Model itself.
领英推荐
Size of a Natural Language Reasoning Engine
However, here comes the impact of the concept of emergent abilities of the LLMs which the research community may not be sufficiently acknowledging. How big should the LLM be so that its usage is limited to reasoning engine as against being trained on entire available internet data?to serve as knowledge base?
The problem is that the two components mentioned above may not be independent and exclusive. ?If we do use an LLM model for reasoning, it is unlikely that the size of the data it is trained on or the size of the parameters can be small even if none of the data it is trained on will be used as knowledge base! Without training on significant amount of data and without building a model of sufficient size in parameters, we may not achieve the emergent consciousness component.
So, what should be?the minimal size of the LLM model wherein its intrinsic data on which it is trained is meant for nothing more than creating a world model that leads to emergent behavior as against being just a knowledge store? As of now, I guess the recently released Mistral 7B model is the smallest one that can serve such a purpose.?But I believe it might be possible to achieve those abilities with smaller LLMs with improved architectures and by training on selective world model data.?
It might be possible to create?an LLM as a light weight reasoning engine whose parameters are not meant to be memory store. From the data perspective, by selectively improving the nature, content, and quality of data that is sufficient enough for the LLMs to build an internal ‘embedding based reasoning?model’, we can achieve an LLM that is a lightweight reasoning engine.
Or Maybe we may end up inventing completely new transformer architectures that might serve as the lightweight reasoning engines as suggested in the article above!
Current Research and Application of LLM based AI applications
The current direction of research and evolution of Large Language Models like GPTs can be categorized into two major areas: One is to achieve Artificial General Intelligence (AGI) and the other is to use the LLMs as a central hub with?autonomous agents
The first area is alluding to the possibilities like creating a Skynet of Terminator Movie fame, using AI to solve problems of humanity or simply expecting the LLMs to come out with new synthesized knowledge that is not already available in the datasets it has been trained on, for example, a cure for a disease or a solution for climate change problem etc.
To achieve something like Skynet of Terminator movie fame?or other such possibilities, we may require AGI but to serve the current needs of the business and make our life more easy, we may not necessarily wait for AGI. The concept of langchain agents or autonomous agents similar to those proposed by Langchain Framework or Microsoft’s AutoGen framework is sufficient for our current business?purposes. There are still tremendous business applications that are possible with current GPT models without using any additional AGI.
I guess at this level,?AI agents can be safely deployed to orchestrate very complicated actions that serve several business processes more efficiently involving several program components in different languages and platforms using?several external APIs that fetch information and programming components from outside.?While obviously there will be subjectivity involved and it might also be possible to rig the consciousness component from outside, we can still arrive at safeguards to manage these business risks with further advance in research.
And for using autonomous agents that orchestrate complicated business processes, we may still not require internet-sized GPT-4 kind of models. It may be sufficient to deploy a lightweight LLM as a reasoning engine?for the orchestrating framework.
Conclusion
While the journey towards AGI is filled with challenges and uncertainties, the current advancements in LLMs present?huge potential. The emergent abilities observed in these models hint at their potential beyond mere memorization machines. As we continue to refine these models and explore new architectures, we inch closer to realizing their full potential. Whether it’s serving as a central hub for autonomous agents or orchestrating complex business processes
?
Global Delivery Head @ Cognizant | Life Sciences Commercial and Med Tech Leader
10 个月Good article..