Large Language Models
Large language models like ChatGPT have emerged as the ‘it’ technology of AI. They are remarkable products, distinguished by prowess in the realms of general-purpose language generation and comprehension. These linguistic leviathans come about through a laborious odyssey of self-supervised and semi-supervised learning, where they distill the essence of language from myriad documents with hundreds of millions of words. The most formidable among them are crafted incorporating transformers into their architecture. As architects of words, large language models wield the power of text generation by weaving sequences of tokens or words from input text, each prediction a step in the creation of artful prose, poetry and other creative forms.
Until 2020, fine-tuning was the sole technique for molding these models to specific tasks. Yet, the advent of colossal constructs like ChatGPT-3 heralded a new era where prompt engineering could coax similar feats from these digital oracles. They are believed by some to harbor a deep understanding of the syntax, semantics, and ontologies that underpin human language, albeit tinged with the imperfections and biases of their textual wellsprings. Notable large language models include OpenAI's GPT series, Google's PaLM and Gemini, Meta's LLaMA family of open-source models, and Anthropic's Claude models, each a testament to the ever-evolving landscape of language and artificial intelligence.
There is a question of whether large language models possess agency, that is, the capacity to make autonomous decisions and engage in independent thought. These are questions inspired by the dark imagery of Cameron’s Terminator mythology. Generally we believe that large language models do not exhibit agency in the sense of being independent from the context and information provided by their users and their extensive training datasets, which may encompass hundreds of millions of records.
A large language models is a language model and not an agent, as it lacks intrinsic goals. But it can be a component in the construction of an intelligent agent. One method is the 'Reason + Act' approach, which imbues a large language model with the capacity to plan where it is encouraged to ‘think aloud,’ to generate a sequence of thoughts that culminate in an actionable outcome. The 'Describe, Explain, Plan, and Select' method extends this concept by integrating the large language model with the visual domain through descriptive imagery to plan for complex tasks and behaviors, drawing upon its pre-trained knowledge base and the nuances of environmental feedback. Another approach is 'Reflexion,' where an agent is fostered to learn across multiple episodes in a narrative-like setting. At the conclusion of each episode, the large language model is presented with a summary from which it extracts ‘lessons learned.’ These insights are then bestowed upon the agent, guiding its actions in subsequent episodes, establishing a continuous cycle of reflection and growth. Though large language models lack agency, their progeny may possess this, and we can never truly rule out a Terminator style rogue AI.
领英推荐
The mechanism known as attention emulates the nuances of cognitive focus, and has its origins in recurrent neural networks. This mechanism assigns "soft" weights to each word, or rather to its embedding, within a designated context window. These weights can be calculated either in parallel, as seen in the transformative architecture of transformers, or sequentially, as is the case with recurrent neural networks. Unlike "hard" weights, which are trained and fine-tuned to a fixed state, "soft" weights possess the fluidity to adapt and evolve with each runtime, mirroring the dynamic nature of human attention. This may be roughly thought of as a type of consciousness akin to human consciousness. Human consciousness is a high bandwidth, energy intensive process in the brain that is attuned to handling large amounts of real-time data, or to learning from new data input from the sensory organs, especially the eyes. Consciousness is energy intensive because it is designed to reprogram memories in the brains neural network, rather than just to act instinctively on a predefined set of memories or atavistic impulses.
An AI’s attention mechanism can be implemented either in parallel, as seen in the transformers, or sequentially, as in the recurrent neural networks. The invention of attention was designed to remedy the shortcomings of recurrent neural networks in leveraging information from hidden outputs. These networks, biased towards recent information, tended to neglect the earlier parts of its training. Attention, however, democratizes access to any segment of a sentence, allowing for a more balanced representation of information.
Initially, ‘attention’ was coupled with serial recurrent neural networks in language translation systems. They were contextualized through a parallel multi-head attention mechanism. This allowed for the amplification of key data inputs while diminishing the less important ones. However, the advent of transformers, with their reliance on the swifter parallel attention scheme, marked a departure from the recurrent architecture. The transformer, a brainchild of the seminal paper “Attention Is All You Need” in 2017, but implemented in primitive form in early Perceptrons, eschewed recurrent units for a multi-head attention mechanism, thereby reducing training time. Transformers pushed and replicated input in information further down the hidden layers of an AI, allowing substantially larger training datasets while avoiding the vanishing gradient problems of earlier networks. Transformers which repeatedly push raw input data further down the neural network, are now a cornerstone in natural language processing and beyond.
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
2 个月LLMs are indeed a powerful manifestation of AI, but their capabilities extend beyond simple mimicry. They demonstrate emergent properties, learning complex patterns and relationships within data that were not explicitly programmed. I think the real challenge lies in understanding how these emergent behaviors arise and how to guide their development towards beneficial applications. Can you elaborate on the ethical considerations surrounding the potential for LLMs to exhibit unintended biases or generate harmful content, given their capacity for self-modification and adaptation?