How a 100-years old linguistic school holds key to understanding ChatGPT

How a 100-years old linguistic school holds key to understanding ChatGPT

The last 2 years have seen a meteoric rise in ChatGPT and other LLMs with varying architectures, size, training data and multiple different sorts of bells and whistles. Everyone knows OpenAI, the company that revolutionised this whole space to the consumers. But not everyone knows that the large language models and the 20th century philosophical school of structuralism have much more in common than what meets the eye.

Proposed by Ferdinand de Saussure, structuralism was an avant-garde philosophy challenging the extreme focus on science and empirical observations that developed in the 19th century. Structuralism argued that human culture and language could not be fully understood through empirical data alone; instead, they required analysis of the underlying structures that shape human experience.

The article explores some of the key tenets of this school of thought and how years down the line, ChatGPT utilises those tenets to make machines understand language and communicate with us

1. Tokenization of Text

ChatGPT tokenizes text into tokens using Byte Pair Encoding (BPE). BPE works by iteratively merging the most frequent pairs of characters or subwords. Each token is mapped to an index in the vocabulary, allowing the model to process them.

Signifier-Signified Relationship: 
In structuralism, language is a system of signs. Each token acts as a signifier pointing to a specific concept (signified). Tokenization echoes this by breaking down language into fundamental units. This equation reflects the structuralist idea that the significance (meaning) of a token t is determined by its relations (how it is formed and used) within the language structure (corpus)        


2. Embedding of Tokens

Tokens are converted into dense vectors via an embedding layer, which represents the semantic information in high-dimensional space. The embeddings are learned as part of the model's training process.

Binary Oppositions and Contextual Meaning: 
Embeddings capture contextual meanings, similar to how structuralism views meaning as dependent on relationships between elements in language. This equation reflects the structuralist principle that the meaning (semantic similarity) of tokens arises from their relationships (vector proximity) within the language structure (embedding space).        



3. Self-Attention Mechanism

ChatGPT uses self-attention to weigh the importance of different tokens in the input sequence. The attention scores are computed using the dot product of query (QQQ), key (KKK), and value (VVV) matrices

Interconnected Structure:
 The self-attention mechanism mirrors structuralism's view of language as a network of relationships, where each word's significance is evaluated in relation to others. The self attention score reflects the structuralist idea that the significance (meaning) of a token ?? is determined by its contextual relationships with all other tokens ??. The weight matrix A  shows how much attention the model pays to token 
?? when processing token ??.        


4. Layered Transformer Architecture

Explanation: ChatGPT's transformer architecture consists of multiple layers of self-attention and feed-forward neural networks. The layers enable the model to capture hierarchical patterns in text.


Hierarchical Structure
The layered architecture reflects the structuralist approach of analyzing language at different levels, from basic elements to complex structures.The output of each layer in the transformer model can be seen as a transformation of the input sequence where the "meaning" of each token is refined through layers of self-attention and feed-forward networks. This reflects the structuralist idea of layered, hierarchical relations. Equations (1) and (2) depict how each layer's output is refined by applying self-attention and feed-forward transformations, building up contextual meaning through relational structures.        


5. Pre-training on Large Datasets

Explanation: ChatGPT is pre-trained on vast datasets using a masked language model (MLM) objective. The model learns to predict the masked words based on context.


Contextual Understanding:

Structuralism suggests that the meaning of a word is defined by what it is not (i.e., through its differences with other words). Similarly, MLM relies on predicting the missing word based on the differences and similarities in the context provided by unmasked words.

The probability P represents how well the model understands the meaning of the masked word x(m) from its context x(~). This is akin to the structuralist idea that the meaning of a word arises from its relations with other words.The representation h(m) captures the dependencies and relations between the masked word and its context, which corresponds to the structuralist view of meaning emerging from a network of signs.        



While LLMs are a product of advanced technology and machine learning, their approach to understanding and generating language reflects many of the core ideas of structuralism. By recognizing the importance of context, relationships between words, and underlying structures, LLMs echo the structuralist belief that meaning is not inherent in individual signs but is constructed through their interactions within a larger system.

LLMs like ChatGPT are reflections of structuralist ideas that have shaped our understanding of language for decades. By learning from patterns, structures, and relationships within vast amounts of text, LLMs embody the very principles that structuralists championed: meaning emerges not in isolation but through a complex web of connections. As we continue to advance in building more sophisticated LLMs we need to realise that they are successful because they are able to understand and connect the dots within language that structuralism formalized decades ago.

Helen Teplitskaia

Chair & Global Managing Partner at Imnex Group Inc., Founder & President, Global Alliance on Sustainability & AI (GASAI)

2 个月

We can delve much deeper into history to explore how influential figures like Ulugh Beg—a renowned astronomer, mathematician, and ruler of the Timurid dynasty born in Samarkand (now in #Uzbekistan)—alongside the philosophies of Laozi, Buddhism, Confucianism, and Indian thought, have profoundly shaped the ideas of major structuralist thinkers.

Bren Kinfa ??

Founder of SaaSAITools.com | #1 Product of the Day ?? | Helping 15,000+ Founders Discover the Best AI & SaaS Tools for Free | Curated Tools & Resources for Creators & Founders ??

2 个月

Structuralism and LLMs share common ground in pattern recognition. It's wild how history influences tech evolution, right? Rahul G.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了