???????Demystifying AI. GPT4 Leaked. MOE, rMLP????
Alicia Colmenero Fernández
Master Artificial Intelligence. EdenAIs .
*automatic translation Gpt-4
GPT4 Leaked. Demystifying AI
This week, an article on the architecture, infrastructure, and other aspects of GPT4 was published on Semianalysis, a paid subscription platform. This article is significant given the company's secrecy about its product. The article was created consulting various sources and a few days ago a Twitter user leaked it, so it remained online for a few hours until Twitter removed it due to copyright infringement. But as we know... what is published on the Internet, the Internet keeps forever.
Therefore, summaries proliferated and we are also going to summarize what we consider most important, answering questions.
Size
GPT4 would be 10 times bigger than GPT3. If GPT3 has 175B, GPT4 would be around 1.8 trillion parameters organized in 120 layers. It is the megalodon of the LLMs.
Architecture
The most relevant is that GPT4 is a MOE. What does this mean? That it is not a single model but many, as it uses a technique called "Mixture of Experts" (MOE). Keep this idea.
MOEs are a strategy for the design of neural networks that involves having several models specialized in particular problems. Specifically, GPT4 would be composed of 16 expert models and the architecture of these models would be that of an MLP, i.e., a multilayer perceptron.
We add: recurrent rMLP, layers connected in cascade for feed-forward, and the last one can send the information back. T
This structure leverages the experts' ability to handle aspects of a problem efficiently and on complex cases. The feed-forward would be done with 2 experts on top. Thus, it becomes suitable for natural language processing. Each model is a neural network but with 11 billion parameters each. This makes it scalable and sustainable.
The mystery of the data..
One could talk about a training in two phases: specific on code and language.
In relation to the dataset, the numbers don't add up for the authors at the time of training (2021). Large available datasets, such as Common Crawl, wouldn't even account for half of the 13 trillion tokens of the total training volume. It is suggested that instruction data could come from ScaleAI, a company controversial for its handling of sensitive and private information, both in terms of consent and accuracy. This company provides labeling services to Uber, Lyft, Toyota, Airbnb, and Pinterest, among others.
Both missing tokens and the problem of repeated tokens are mentioned. Therefore, it is concluded that it has been trained with textbooks, specialized books, many, many academic papers, and even the entire GitHub repository. Let's pay special attention to a large volume of repeated tokens, this is very important.
We mention this excellent research article from the Washington Post where they conduct an analysis with researchers from the Allen Institute about proprietary, personal, and even offensive websites that are included in the C4 Colossal Clean Crawled Corpus (content from 15 million websites present in the training of mega models like Facebook's Llama and theoretically also ChatGPT). In the C4 search engine, you can check if a website is present in the dataset, what rank it occupies, and how many tokens and what percentage of tokens it has over the total. A token is a unit of meaningful text that we could symbolically equate to words and that serves the model to process information.
For example: medium.com occupies 33M, 0.02% of tokens, while linkedin.com is present with a few tokens: 310 tokens and a token percentage of 0.0000002%. However, the small blog of Sam Altman, CEO of OpenAI, even though in 2021 he was still "a young promise of the copla", still irrelevant, occupies the not insignificant amount of 38K tokens and is ranked 621,107 with 0.00002%. This would be a type of content susceptible to repetition and would justify learning certain contents.
It's worth mentioning other biases of the model such as alignment as a human or resistance to literal transcriptions about critical content towards the company its product and even similar products. Unconfirmed, the speculative decoding.
Unconfirmed, the speculative decoding.
The article also collects the rumor that in GPT4 a method called speculative decoding could be being used. This process implies that it is a smaller and faster model that generates a series of token predictions that are passed to a larger model that has to verify them. This more competent model, which has been called "oracle", evaluates whether the prediction tokens are correct to approve the processing of the entire batch; if it doesn't find them correct, they are discarded and the larger model takes control. And we join this perception because we have also occasionally had a kind of multiple personality response where the model even changes the expression of its gender and refers to previous responses as those given by "its models". Of course, we don't know if this is the result of a funny coincidence or expression of its architecture. But if confirmed, it would make all the sense, even to explain the evaluation of sensitive content.
If you like this content and want to support us to keep bringing you free little things, you can invite us to a symbolic coffee. Secure platform for payments with PayPal. Ko-fi! ????"
Experienced Technology Professional with a Passion for Innovation Decotechs - Hanker
1 年The advancements in AI language models like MOE and rMLP are sure to revolutionize our understanding of data and natural language processing. Exciting times ahead!