Large Language Models as Reasoning Engines: Decoding the Emergent Abilities and Future Prospects

Large Language Models as Reasoning Engines: Decoding the Emergent Abilities and Future Prospects

Introduction

Today, I read an article in InfoWorld about Artificial General Intelligence (AGI) and LLMs?titled “Progress in AI requires thinking beyond LLMs’ (https://www.infoworld.com/article/3715062/progress-in-ai-requires-thinking-beyond-llms.html). . The author states, ‘At their core, LLMs are nothing more than sophisticated memorization machines, capable of reasonable-sounding statements, but unable to understand fundamental truth.’ The author concludes that we are far away from creating an AGI and we are unlikely to achieve the same by pushing the frontiers through research on Large Language Models?and that?AGI requires thinking beyond LLMs.

Are the current LLMs nothing more than sophisticated memorization machines? I don’t think so. It is quite surprising for me to note that there are still many who woule like to think that LLMs are nothing more than natural language processing programs that have an embedding model through which they generate next predicted tokens based on statistical means, and through which they generate new content. There is an evident?‘consciousness component’ which they seem to overlook, especially since this consciousness component cannot be explained away through the transformer’s embedding model or multi-head self-attention architecture. This consciousness component may not be the same as the so-called ‘AGI’ but it is definitely the ‘AI’.

LLMs and Emergent Abilities

This phenomenon of ‘emergent behavior alluding to?consciousness’ was very clearly observed at the early stages and researchers rightly termed it as ‘emergent behavior’. It is defined as an ability that is not present in the small models but present in large models. Jason Wei talks about more than 100 emergent abilities that are discovered by scaling language models?(https://www.jasonwei.net/blog/emergence). Unfortunately, some of the subsequent researchers concluded that there is really nothing called ‘emergent abilities’ but it is the result of usage of wrong metrics etc. By refuting the presence of emergent ability, it is possible that a significant area of research has taken a back seat. If the emergent behavior is nothing to do with the size of the model parameters but externally induced then we can dismiss it as ‘non-science’?as it leads to subjectivity,?and cannot really be termed ‘science’?which is meant for objectivity in dealing with nature. But we cannot dismiss ‘emergent abilities’ as pseudo-science.

The emergent abilities imply that some of the most important capabilities of LLMs like ChatGPTs were?achieved by simply scaling the training of the existing transformer architecture-based models, given the same size of dataset or knowledge model. This does not necessarily refute the possibility of achieving the same through further advanced research in LLM architectures?with much smaller models. For example, the recently released Mistral 7B is supposedly outperforming the much bigger LLM models through the introduction of new mechanisms like Grouped-query Attention (GQA), which allows for faster inference times compared to standard full attention, and Sliding Window Attention (SWA) that gives Mistral 7B the ability to handle longer text sequences at a low cost. By dismissing emergent abilities, we miss out on research in these areas.

Instruction tuning and Emergent Capabilities

In one of the short courses on LLMs in deeplearning.ai, the difference between base foundation models and instruction-tuned models was explained. Many believe that instruction tuning is nothing more than training a foundation LLM using input-output pairs of instructions so that it can learn several other tasks. When I was playing around, it was clear to me that instruction tuning is not really creating any consciousness component or additional knowledge. The base foundation model has an ‘emergent consciousness component’ that has assimilated a knowledge model but does not know how to deliver on specific requests?or instructions. The instruction tuning helps it to formulate its consciousness to generate the answers. In other words, the concept of instruction tuning is more properly applicable only for models that are already possessing ‘emergent abilities’. The Reinforcement Learning through Human Feedback layer further fine-tunes this delivery of capabilities arising through the emergent components.?In other words, the instruction tuning may be more relevant for LLMs that have already achieved emergent abilities.

Large Language Models as Reasoning Engines

The functionalities of LLMs can be categorized into two categories:

1. A Natural Language Interface or Reasoning Engine - Through this interface, humans can communicate with computer programs using natural languages and receive created content required format?

2. A Knowledge Store or Searchable Knowledge Base - this contains the internet-based knowledge?based or knowledge contained in the datasets trained on.?

Consider a company that deploys a ChatGPT-like model for its internal usage. A model like GPT-3 has 175 billion parameters while GPT-4 has 1.76 trillion parameters, most of which might be considered knowledge store (which the author in the article quoted in the first paragraph refers to as memorized data). Suppose the company uses a RAG model and most of its functionalities are served through the intranet-based knowledge. In other words, the GPT-4 model might not use any of the internet-based knowledge it has stored in its parameters. All the data it processes is only from the company’s internally generated private information. If this is the case, Do we really require such a large model?of 1.76 trillion parameters, given that the model used by the company does not use any of the inbuilt knowledge store of the GPT-4 model?

This implies a possibility that we can have a lightweight LLM that is used only as a reasoning engine or natural language processing interface component while all the information/knowledge it processes is external to the Model itself.

Size of a Natural Language Reasoning Engine

However, here comes the impact of the concept of emergent abilities of the LLMs which the research community may not be sufficiently acknowledging. How big should the LLM be so that its usage is limited to reasoning engine as against being trained on entire available internet data?to serve as knowledge base?

The problem is that the two components mentioned above may not be independent and exclusive. ?If we do use an LLM model for reasoning, it is unlikely that the size of the data it is trained on or the size of the parameters can be small even if none of the data it is trained on will be used as knowledge base! Without training on significant amount of data and without building a model of sufficient size in parameters, we may not achieve the emergent consciousness component.

So, what should be?the minimal size of the LLM model wherein its intrinsic data on which it is trained is meant for nothing more than creating a world model that leads to emergent behavior as against being just a knowledge store? As of now, I guess the recently released Mistral 7B model is the smallest one that can serve such a purpose.?But I believe it might be possible to achieve those abilities with smaller LLMs with improved architectures and by training on selective world model data.?

It might be possible to create?an LLM as a light weight reasoning engine whose parameters are not meant to be memory store. From the data perspective, by selectively improving the nature, content, and quality of data that is sufficient enough for the LLMs to build an internal ‘embedding based reasoning?model’, we can achieve an LLM that is a lightweight reasoning engine.

Or Maybe we may end up inventing completely new transformer architectures that might serve as the lightweight reasoning engines as suggested in the article above!

Current Research and Application of LLM based AI applications

The current direction of research and evolution of Large Language Models like GPTs can be categorized into two major areas: One is to achieve Artificial General Intelligence (AGI) and the other is to use the LLMs as a central hub with?autonomous agents that take natural language instructions as input and carry out complicated set of?actions to reach a stated objective.

The first area is alluding to the possibilities like creating a Skynet of Terminator Movie fame, using AI to solve problems of humanity or simply expecting the LLMs to come out with new synthesized knowledge that is not already available in the datasets it has been trained on, for example, a cure for a disease or a solution for climate change problem etc.

To achieve something like Skynet of Terminator movie fame?or other such possibilities, we may require AGI but to serve the current needs of the business and make our life more easy, we may not necessarily wait for AGI. The concept of langchain agents or autonomous agents similar to those proposed by Langchain Framework or Microsoft’s AutoGen framework is sufficient for our current business?purposes. There are still tremendous business applications that are possible with current GPT models without using any additional AGI.

I guess at this level,?AI agents can be safely deployed to orchestrate very complicated actions that serve several business processes more efficiently involving several program components in different languages and platforms using?several external APIs that fetch information and programming components from outside.?While obviously there will be subjectivity involved and it might also be possible to rig the consciousness component from outside, we can still arrive at safeguards to manage these business risks with further advance in research.

And for using autonomous agents that orchestrate complicated business processes, we may still not require internet-sized GPT-4 kind of models. It may be sufficient to deploy a lightweight LLM as a reasoning engine?for the orchestrating framework.

Conclusion

While the journey towards AGI is filled with challenges and uncertainties, the current advancements in LLMs present?huge potential. The emergent abilities observed in these models hint at their potential beyond mere memorization machines. As we continue to refine these models and explore new architectures, we inch closer to realizing their full potential. Whether it’s serving as a central hub for autonomous agents or orchestrating complex business processes, LLMs are poised to play a pivotal role in the future of AI. It is quite possible that we may also arrive at a light weight reasoning engine which processes all of its data from external sources with more advanced LLM architectures. As we navigate this exciting landscape, it’s crucial to continue the discourse, challenge existing notions, and drive forward the boundaries of what’s possible.

?

Karthikeyan Nagarajan

Global Delivery Head @ Cognizant | Life Sciences Commercial and Med Tech Leader

10 个月

Good article..

回复

要查看或添加评论,请登录

Murugesan Narayanaswamy的更多文章

社区洞察

其他会员也浏览了