Why relying just on LLMs is not good for the future of AI?
Sandeep Reddy
Professor | Chairman | Entrepreneur | Author | Translational AI in Healthcare
Recently, the discourse around AI research and development has been heavily focused on Large Language Models (LLMs), along with generative AI variants like Large Vision and Diffusion Models. With the advent of native LLMs evolving into MultiModal Models (MMLs), it seems likely that 2024 will continue this trend. Increasingly, the future of AI development appears to hinge on LLMs, aptly dubbed "Foundational Models." This enthusiasm is evident in research papers, newsletters, and social media posts, where these models are widely acclaimed. The rising interest has even spurred discussions and initiatives for specialized regulation, ethical frameworks, and institutions dedicated to LLMs and Generative AI. Amidst this, traditional AI approaches like Gradient Boosting Machines (GBMs), ResNets, and supervised learning seem to be fading from the spotlight. This shift in focus is a cause for concern.
Despite my reservations, I must acknowledge my excitement about LLMs and MMLs, particularly their potential in medical and healthcare domains, where I specialize. The democratization of AI development, facilitated by LLM APIs and chatbot engineering, is another positive development. It has significantly lowered the barriers to machine learning model development and operationalization, which previously required specialized skills, infrastructure, and substantial resources. Moreover, finely-tuned LLMs and MMLs are proving more effective in the medical field than their predecessors, offering broader applications and clearer ROI.
However, my concerns are twofold. Firstly, the development of LLMs imposes significant restrictions, particularly in terms of who can build and manage these models. Given the substantial computational infrastructure required, it's likely that only large tech firms or governmental agencies will lead in developing and managing LLMs in the future. This could limit the involvement of smaller firms and academic institutions, which might have to rely on pre-developed LLMs. Even in the event of a coalition forming to develop LLMs or MMLs, it's doubtful that their performance would match that of models developed by major tech companies. This could lead to a concentration of control over AI development, posing a challenge to AI democratization and innovation.
Secondly, there are significant environmental concerns associated with the development and maintenance of LLMs. For example, assuming ChatGPT utilizes NVIDIA A100 GPUs and considering OpenAI's location and partnership with Microsoft Azure, the carbon footprint of running and maintaining ChatGPT is substantial. Estimates suggest that ChatGPT's operational carbon footprint could be around 6,782.4 tons of CO2e over a certain period, excluding the footprint of training the GPT models. This figure highlights just one aspect of environmental impact, with other factors like water consumption also playing a significant role in the overall sustainability of these technologies.
领英推荐
A third concern within the AI industry and academia is the rapidly growing assumption that the path to future AI development, and ultimately to achieving Artificial General Intelligence (AGI), exclusively lies through LLMs or their iterations. It's not just the dominance of LLMs in the AI field that is worrisome, but also the near-universal veneration they receive. Even seasoned AI scientists have begun to speculate about the emergence of sentience and agency from these models. However, it's important to remember that LLMs, and similarly mislabeled 'neural networks,' do not function like the human brain and its cognitive processes. While numerous expert commentaries explain these differences, one particularly insightful example can be found here . In essence, LLMs are statistical autoregressive models, whereas human intelligence operates on a theory of mind and dialogic models. Predicting words and images based on a vast corpus of data is not akin to human learning and cognition. LLMs lack the ability to grasp emotional understanding, figurative language, and the nuances of the physical world, as detailed in this submission .
This raises the question: Are we, perhaps influenced by vested interests, abandoning the pursuit of energy-efficient, brain-like, transparent, and democratic AI models? As someone who critically and objectively follows AI trends, these concerns seem valid. I urge the industry, academia, governments, and other stakeholders not to marginalize or overlook other areas of AI model development in favor of focusing solely on LLMs. Relying exclusively on one approach could lead to significant drawbacks and costs. Instead, what we need is a diversity of approaches and models, tailored to specific interests and needs. Let's advocate for a future that embraces this diversity
Portfolio Career - digital healthcare content author, course developer, professional event moderator, educator, consultant and digital health advocate
10 个月Thanks Sandeep Reddy good food for thought. Regarding carbon emissions I am not too concerned as the firms have committed to zero carbon footprints and Microsoft is in fact doing so retrospectively. And the newer models are potentially able to run on portable mobile devices. But your bigger point of the over-infatuation with LLM is a good one. Regarding inability to match our theory of mind, empathy etc - I think there are many human “leaders” in the current world who also do not exhibit these characteristics so I don’t take that as a given even in humans. For me the key thing is that 2023 brought AI to the foreground of the general public. It is now brimming with potential good and harmful paths. 2024 will see both start to play out. The ability to interrogate the “black box” systems as to why they made certain output decisions will be key to safety in medicine as you and I have frequently discussed and this remains a key limitation of many models to be overcome. I remain positive, but concerned…
Founder & Lead Developer @ wordup development AG | Advancing business strategy using generative AI applications
10 个月The autoregression, Seq2Seq-learning models that have impressed us the most have been trained on language data. Language data is abundant, perhaps more so that any other richly structured data source we have. Much more data comes out of the LHC in 1 second, but that data is not very rich. So, autoregression has surprised us, but mainly because we recognise ourselves is the statistics of the autoregression. I wonder if we are confusing a jump in technology (autoregression) with, simply, a technology that is able to train on the most abundant and richly structured data source we have? Thus, the LLM Seq2Seq training technology might be replaced, anytime, with a better learning architecture. And hopefully a more computationally efficient one! And I know you have already placed your bets on what that technology might be. I hope you are right!
Investor looking to purchase businesses doing at least $200k in EBITDA
11 个月Can't wait to read it! ??