ChatGPT is suddenly popular, and these chips will benefit!

ChatGPT is suddenly popular, and these chips will benefit!

The ChatGPT craze has recently swept the world.


ChatGPT (Chat Generative Pre-trained Transformer), a conversational AI model launched by OpenAI in November 2022, achieved a monthly activity of over 100 million in only 2 months of release, making it one of the fastest growing consumer apps in history in terms of users.


Behind ChatGPT is the application of "human feedback reinforcement model". Based on the Q&A model, ChatGPT can reason, write code, create text, etc. This special advantage and user experience has led to a significant increase in the flow of application scenarios.


As the number of ChatGPT users grew rapidly, the hot demand triggered downtime. With the influx of huge user base, ChatGPT server went down 5 times in 2 days, which is a remarkable degree of fire and also gives rise to higher requirements for arithmetic infrastructure construction, especially the underlying chips. So, what chips will ChatGPT drive the demand?


Surge in demand for AI servers

Currently, ChatGPT is used for reasoning, writing code and text creation based on Q&A mode, and the number of users and the number of uses have increased, while larger flows are generated in some new application scenarios, such as smart speakers, content production, game NPCs, and companion robots. As the frequency of end-user usage increases, data traffic skyrockets, and the requirements for data processing capability, reliability and security of the server increase accordingly.


From the technical principle, ChatGPT is based on Transformer technology. As the model continues to iterate, the number of layers increases, and the demand for arithmetic power becomes bigger and bigger. From the viewpoint of operation conditions, ChatGPT has three conditions for perfect operation: training data + model algorithm + arithmetic power, the need for large-scale pre-training on the base model, the ability to store knowledge from 175 billion parameters, and the need for a large amount of arithmetic power.


According to the data, ChatGPT is a model optimized based on GPT-3.5, which is a fine-tuned version of GPT-3.0. OpenAI's GPT-3.0 model has the ability to store knowledge from 175 billion parameters and costs about $4.6 million for a single training session, and GPT-3.5 is trained on Microsoft AzureAI supercomputing infrastructure with a total computing GPT-3.5 is trained on Microsoft Azure AI supercomputing infrastructure, and the total computation power consumption is about 3640 PF-days (i.e., if you compute one thousand trillion times per second, you need to compute 3640 days).


It can be said that ChatGPT has driven the chip industry volume and price, that is, not only the number of AI underlying chips has generated greater demand, but also the underlying chip computing power has also put forward higher requirements, that is, to drive the demand for high-end chips. It is reported that the cost of purchasing a piece of Nvidia's top GPU is 80,000 yuan, and the cost of GPU servers is usually more than 400,000 yuan. The arithmetic infrastructure supporting ChatGPT requires at least tens of thousands of Nvidia GPU A100, and the rapid increase in demand for high-end chips will further drive up the average chip price.


With the surge in ChatGPT traffic, AI servers as arithmetic carriers will see important development opportunities. The global AI server market is expected to grow from $12.2 billion in 2020 to $28.8 billion in 2025, with a compound annual growth rate of 18.8%.


These chips will benefit

In terms of chip composition, AI servers are mainly CPU + acceleration chips, usually equipped with GPU, FPGA, ASIC and other acceleration chips, using the combination of CPU and acceleration chips can meet the demand for high throughput interconnection.


1.CPU

As the computing and control core of a computer system, it is the final execution unit for information processing and program operation. Its advantage is that it has a large number of caches and complex logic control units, and is good at logic control and serial operations; its disadvantage is that it has a small computational volume and is not good at complex algorithm operations and handling parallel repetitive operations. Therefore, CPUs can be used for inference/prediction in deep learning.


Currently, server CPUs are moving toward multi-core to meet the need for processing power and speed increase, such as AMD EPYC 9004 with up to 96 cores. However, system performance merit cannot only consider the number of CPU cores, but also the operating system, scheduling algorithms, applications and drivers.


2. GPU

GPUs are highly adapted to AI model building, and they are now widely used as acceleration chips due to their parallel computing capabilities and compatibility with training and inference. Take NVIDIA A100 as an example, during training, GPUs help solve problems at high speed: 2048 A100 GPUs can handle training workloads like BERT in a minute at scale. During inference, multi-instance GPU (MIG) technology allows multiple networks to run simultaneously based on a single A100, optimizing the utilization of computational resources. On top of the other inference performance gains of the A100, structural sparsity support alone can deliver up to twice the performance improvement. On advanced conversational AI models such as BERT, A100 can boost inference throughput up to 249 times that of the CPU.


Currently, ChatGPT has triggered a GPU application boom. Among them, Baidu is about to launch Wenxin Yiyin (ERNIE Bot). Apple, on the other hand, has introduced the M2 series of chips (M2 pro and M2 max) designed for AI gas pedals that will be featured in new computers. With the surge in ChatGPT usage, OpenAI needs more computing power to respond to the demand of millions of users, thus increasing the demand for Nvidia GPUs.


AMD plans to launch TSMC's 4nm "Phoenix" series chips, which compete with Apple's M2 series chips, and the "Alveo V70" AI chip, which is designed using the Chiplet process. Both chips are scheduled to come to market this year, targeting the consumer electronics market and AI reasoning field respectively.


3. FPGAs

FPGAs are characterized by high programmable flexibility, short development cycle, field reprogrammable functions, low latency, and convenient parallel computing, enabling large models through deep learning + distributed cluster data transfer.


4.ASIC

ASICs have the advantages of smaller size, lower power consumption, improved reliability, higher performance, enhanced confidentiality, and lower cost compared with general-purpose integrated circuits in mass production, which can further optimize performance and power consumption. With the development of machine learning, edge computing and autonomous driving, a large number of data processing tasks are generated, and the requirements for chip computing efficiency, computing power and counting power consumption ratio are getting higher and higher. ASICs are widely noticed by combining with CPUs, and leading domestic and foreign manufacturers have laid out to meet the arrival of the AI era.


Among them, Google's latest TPU v4 cluster, called Pod, contains 4096 v4 chips, which can provide more than 1 exaflops of floating point performance. NVIDIA GPU+CUDA is mainly for large data-intensive HPC and AI applications; the Grace-based system is closely integrated with NVIDIAGPU and offers 10 times higher performance than the NVIDIADGX system. Baidu's Kunlun 2 generation AI chip adopts the world's leading 7nm process and is equipped with the self-developed second generation XPU architecture, which improves performance by 2-3 times compared to the first generation; Kunlun Core 3 generation will be mass produced in early 2024.


5. Optical Module

Currently, the demand for model arithmetic in the AI era has grown well beyond the speed of Moore's Law, especially after the era of deep learning and large models, which is expected to double in 5-6 months. However, data transmission rate becomes an easily overlooked arithmetic bottleneck. Along with the growth in data transmission, the demand for optical modules as a carrier for interconnecting devices within data centers has grown.




This article is an original article of EET Electronic Engineering Album, contributed by Jimmy.zhang

要查看或添加评论,请登录

Victory Electronics的更多文章

社区洞察

其他会员也浏览了