How does ChatGPT manage contextual understanding?
In the realm of natural language processing (NLP), surpassing the confines of individual words to grasp broader context presents a significant challenge. This challenge emerges in various contexts, such as translation, where determining the grammatical connections between words is crucial, as well as in narrative writing, where maintaining a character's development throughout a story adds complexity. The connotation of words can shift based on context, exemplified by the word "run" in the phrase "Can I run something past you real quick?"
The interplay between words necessitates comprehension, a task that ChatGPT undertakes. The core technology underpinning ChatGPT is the transformer architecture. Initially, this architecture encodes words (represented as byte-pairs, which are fragments of words) into numerical values through a process called "embedding." By converting words into numbers, computers can process them, removing the ambiguity inherent in raw characters. This process can be likened to assigning a unique number to each word in a language, akin to a numerical dictionary.
However, the challenge lies in the myriad ways words relate to one another within a given text. The potential relationships between all words in a sequence create an exponentially complex problem. For instance, with a vocabulary of 20,000 words and a sentence length of 10 words, there could be an overwhelming number of possible combinations (20,000 ^ 10).
To tackle this complexity, transformers employ positional encoding. Each word is assigned a distinctive "address" within a specific context. This is achieved by adding a unique numerical value to the embedding of each word in the sequence. This value encodes the word's position in the sequence, capturing its spatial arrangement relative to other words.
领英推荐
Transformers, including ChatGPT, utilize a blend of sine and cosine functions to determine positional encoding. This technique draws inspiration from the Fourier Transform, a mathematical operation that frequently occurs in everyday devices like smartphones. Analogous to how the Fourier Transform decomposes signals into constituent frequencies, positional encoding determines the contextual value by considering the contributions of neighboring embedded words.
So, why does ChatGPT encounter challenges when composing a novel or a screenplay? GPT-2, an earlier iteration of ChatGPT, employed a context size of 1,024 tokens (where tokens are byte-pairs, roughly equivalent to 300 to 500 words). Although the specific parameters of ChatGPT remain undisclosed, we can estimate the workable context lengths. ChatGPT likely incorporates multiple attention heads (responsible for capturing relationships), although the exact count may differ from GPT-2's 24. As such, a reasonable approximation for maximum context length might hover around 10,000 words – notably less than the 50,000 to 100,000 words typical of novels. While OpenAI initially indicated a maximum input length of 3,000 words, ChatGPT appears to outperform this limitation, as evidenced by its expanded contextual capabilities. (Source: https://ai.stackexchange.com/questions/38150/how-does-chatgpt-retain-the-context-of-previous-questions). This suggests that a context length of around 10,000 words is within the realm of ChatGPT's capabilities.