登录查看更多内容

Large Language Models

J Christopher Westland

Professor at University of Illinois Chicago

发布日期: 2025年1月19日

Large language models like ChatGPT have emerged as the ‘it’ technology of AI. They are remarkable products, distinguished by prowess in the realms of general-purpose language generation and comprehension. These linguistic leviathans come about through a laborious odyssey of self-supervised and semi-supervised learning, where they distill the essence of language from myriad documents with hundreds of millions of words. The most formidable among them are crafted incorporating transformers into their architecture. As architects of words, large language models wield the power of text generation by weaving sequences of tokens or words from input text, each prediction a step in the creation of artful prose, poetry and other creative forms.

Until 2020, fine-tuning was the sole technique for molding these models to specific tasks. Yet, the advent of colossal constructs like ChatGPT-3 heralded a new era where prompt engineering could coax similar feats from these digital oracles. They are believed by some to harbor a deep understanding of the syntax, semantics, and ontologies that underpin human language, albeit tinged with the imperfections and biases of their textual wellsprings. Notable large language models include OpenAI's GPT series, Google's PaLM and Gemini, Meta's LLaMA family of open-source models, and Anthropic's Claude models, each a testament to the ever-evolving landscape of language and artificial intelligence.

There is a question of whether large language models possess agency, that is, the capacity to make autonomous decisions and engage in independent thought. These are questions inspired by the dark imagery of Cameron’s Terminator mythology. Generally we believe that large language models do not exhibit agency in the sense of being independent from the context and information provided by their users and their extensive training datasets, which may encompass hundreds of millions of records.

A large language models is a language model and not an agent, as it lacks intrinsic goals. But it can be a component in the construction of an intelligent agent. One method is the 'Reason + Act' approach, which imbues a large language model with the capacity to plan where it is encouraged to ‘think aloud,’ to generate a sequence of thoughts that culminate in an actionable outcome. The 'Describe, Explain, Plan, and Select' method extends this concept by integrating the large language model with the visual domain through descriptive imagery to plan for complex tasks and behaviors, drawing upon its pre-trained knowledge base and the nuances of environmental feedback. Another approach is 'Reflexion,' where an agent is fostered to learn across multiple episodes in a narrative-like setting. At the conclusion of each episode, the large language model is presented with a summary from which it extracts ‘lessons learned.’ These insights are then bestowed upon the agent, guiding its actions in subsequent episodes, establishing a continuous cycle of reflection and growth. Though large language models lack agency, their progeny may possess this, and we can never truly rule out a Terminator style rogue AI.

领英推荐

How to get more out of LLMs

Stefan Huyghe 1 年前

SLM and LLM... My Top 10 in July 2024

Fabrizio Degni 9 个月前

AutoGen: Empowering Large Language Models — Simplified

Aruna Pattam 1 年前

The mechanism known as attention emulates the nuances of cognitive focus, and has its origins in recurrent neural networks. This mechanism assigns "soft" weights to each word, or rather to its embedding, within a designated context window. These weights can be calculated either in parallel, as seen in the transformative architecture of transformers, or sequentially, as is the case with recurrent neural networks. Unlike "hard" weights, which are trained and fine-tuned to a fixed state, "soft" weights possess the fluidity to adapt and evolve with each runtime, mirroring the dynamic nature of human attention. This may be roughly thought of as a type of consciousness akin to human consciousness. Human consciousness is a high bandwidth, energy intensive process in the brain that is attuned to handling large amounts of real-time data, or to learning from new data input from the sensory organs, especially the eyes. Consciousness is energy intensive because it is designed to reprogram memories in the brains neural network, rather than just to act instinctively on a predefined set of memories or atavistic impulses.

An AI’s attention mechanism can be implemented either in parallel, as seen in the transformers, or sequentially, as in the recurrent neural networks. The invention of attention was designed to remedy the shortcomings of recurrent neural networks in leveraging information from hidden outputs. These networks, biased towards recent information, tended to neglect the earlier parts of its training. Attention, however, democratizes access to any segment of a sentence, allowing for a more balanced representation of information.

Initially, ‘attention’ was coupled with serial recurrent neural networks in language translation systems. They were contextualized through a parallel multi-head attention mechanism. This allowed for the amplification of key data inputs while diminishing the less important ones. However, the advent of transformers, with their reliance on the swifter parallel attention scheme, marked a departure from the recurrent architecture. The transformer, a brainchild of the seminal paper “Attention Is All You Need” in 2017, but implemented in primitive form in early Perceptrons, eschewed recurrent units for a multi-head attention mechanism, thereby reducing training time. Transformers pushed and replicated input in information further down the hidden layers of an AI, allowing substantially larger training datasets while avoiding the vanishing gradient problems of earlier networks. Transformers which repeatedly push raw input data further down the neural network, are now a cornerstone in natural language processing and beyond.

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

2 个月

LLMs are indeed a powerful manifestation of AI, but their capabilities extend beyond simple mimicry. They demonstrate emergent properties, learning complex patterns and relationships within data that were not explicitly programmed. I think the real challenge lies in understanding how these emergent behaviors arise and how to guide their development towards beneficial applications. Can you elaborate on the ethical considerations surrounding the potential for LLMs to exhibit unintended biases or generate harmful content, given their capacity for self-modification and adaptation?

查看更多评论

要查看或添加评论，请登录

J Christopher Westland的更多文章

AI and the Future of Mankind

2025年1月19日

AI and the Future of Mankind

The destiny of artificial intelligence remains shrouded in mystery, its ultimate form yet to be sculpted. The most…

1 条评论
Ethics and Alignment in AI

2025年1月19日

Ethics and Alignment in AI

The ethical quandaries surrounding artificial intelligence have become a pressing concern for researchers. Imagine a…

1 条评论
AI's Biases about Gender, Race and Religion

2025年1月19日

AI's Biases about Gender, Race and Religion

The behavior of artificial neural networks is articulated solely through neuron weights and activations. This cryptic…
Of Worms and Computers

2025年1月19日

Of Worms and Computers

In the natural world, where the marvels of the human brain intertwine with the wonders of artificial neural networks…
Is AI really ‘intelligent’?

2025年1月18日

Is AI really ‘intelligent’?

Artificial intelligence is indeed a technological marvel that sets hearts racing and minds pondering. The allure of…
Artificial Intelligence = Statistics without Bounds

2025年1月18日

Artificial Intelligence = Statistics without Bounds

Early statisticians were limited to the calculating tools of the day – pen and paper and if they were lucky, simple…
Three Goals of Artificial Intelligence

2025年1月18日

Three Goals of Artificial Intelligence

Long before the advent of artificial intelligence, there existed visionaries who would forge the path for AI. These…

1 条评论
The Tragedy of Frank Rosenblatt and the Long Road to Neural Networks

2025年1月18日

The Tragedy of Frank Rosenblatt and the Long Road to Neural Networks

In a time not too long ago, nestled within the scholarly halls of the University of Illinois, a young Chicago visionary…
Oracles, Gurus and Mechanical Men

2025年1月18日

Oracles, Gurus and Mechanical Men

From time immemorial, mankind has ventured into the unknown, seeking the wisdom of beings transcendent, those enshrined…
Superintelligence and the Future of AI

2024年6月19日

Superintelligence and the Future of AI

Nick Bostrom delved into the intricacies of superintelligence, contemplating its genesis, attributes, and aspirations…

See all articles

Large Language Models

J Christopher Westland

Professor at University of Illinois Chicago

领英推荐

J Christopher Westland的更多文章

社区洞察

其他会员也浏览了

The Role of Domain-Specific Small Language Models in Industry-Specific AI Applications

Small Language Models (SLMs) vs. Large Language Models (LLMs): The Future of AI in Enterprises

How Gemini Pro 1.5 Predicts Your Next Move