"Post-Moore Era: Where to Next for AI Computing Architecture?"
Foreword:
It's been a while since my last update. In the midst of this explosive AI development, as a technical observer in the primary market, I feel both excited and anxious. I've been wanting to write something to sort out my feelings and insights, but due to my personal capabilities and writing skills, I've hesitated to put pen to paper. Recently, I was shocked by the ambition of OpenAI's founders for a 7 trillion dollar AI infrastructure, and I feel it's time to talk about the computing architecture after the Moore era. When I settled on this topic, I realized it was too much to handle, but I'll share my thoughts anyway, as a starting point for discussion. I welcome friends and entrepreneurs in the primary market to discuss together, without considering this a serious technical discussion, and I hope everyone can find their own opportunities in this AI wave.
Introduction:
Over the past few decades of computer and semiconductor development, Moore's Law has acted like a relentless metronome, powerfully influencing humanity's progress from early aerospace to household computers, mobile communications, smart terminals, and now the era of AI revolution. "Computing power" has become a strategic resource contested between nations and companies. From the 7 trillion dollar AI infrastructure slogan proposed by Sam Altman, the insatiable appetite of humanity for "computing power" is quite apparent.
Main Text:
At the recent World Government Summit, the UAE Minister of State for AI asked NVIDIA's founder, Jensen Huang, "How many GPUs can 7 trillion dollars buy?" Mr. Huang joked, "Apparently, all of them," and added that NVIDIA has brought about a million-fold increase in AI progress over the past decade. Looking forward to the development of the AI industry, Huang believes that two unprecedented major changes are taking place: 1) the era of "general computing" (centered around CPUs) will transition on a large scale to the era of "accelerated computing" (centered around GPUs); 2) accelerated computing will be more energy-efficient, environmentally friendly, and efficient in bringing about the "generative AI" revolution for humanity. These restrained and implicit words expressed a fact—NVIDIA will stand at the very center of this AI revolution.
As Moore's Law slows down and the demand for computing power surges, the direction of "post-Moore" computing architecture is crucial for us to find our ecological niche in this AI revolution. After all, no country or company dares to fall behind in this race for AI and computing power, even giants like Apple are abandoning car manufacturing to focus on AI. And China, along with the vast number of domestic AI entrepreneurs, must and will inevitably build their own AI infrastructure and ecosystem, and ultimately carve out their own path.
Explosive Growth in AI Computing Power Demand
The last time the public paid special attention to "AI" and "computing power" was around 2016. That year, Google's AlphaGo (Insert: "Go" is capitalized when referring to the game) defeated world Go champion Lee Sedol; Tesla Autopilot introduced a significant update; and OpenAI, co-founded by Musk and Altman, was established, with NVIDIA's founder, Jensen Huang, personally presenting the "world's first DGX-1"—a powerful AI computing platform. At that time, NVIDIA's market value had "only" passed 20 billion US dollars (now about 2 trillion), and most people's understanding of it was still as a "gaming graphics card" supplier. The following year, NVIDIA released the first GPU designed to accelerate machine learning, equipped with multiple Tensor cores, which opened the door to the surge in computing power demand.
When GPUs met AI large models, it was a match made in heaven, bound to spark intense fire. Unlike the serial computation of CPUs, GPUs can handle massive parallel computations simultaneously, which is precisely what machine learning AI technology based on neural networks needs. The emergence of AI large models has made the data centers we have built seem insignificant. Although the global construction of data centers has grown at a scale of hundreds of billions of dollars per year, in the future, this number will change from hundreds of billions to trillions of dollars, and the demand will shift from mere data storage and processing to the deployment, computation, and inference of AI models.
At the same time, the new generation of computing architectures centered around GPUs and DSAs (Domain Specific Accelerators) will define the trillions of dollars worth of new data centers to be built in the next few years. As Mr. Huang said, this huge change is the first since IBM launched the world's first CPU 60 years ago. And in today's post-Moore era, humanity is also striving to break through the computing architecture left to us by von Neumann much earlier.
Breaking Through von Neumann Architecture and Moore's Law
Since the invention of modern computers in the 1940s, the architecture has been built around the von Neumann architecture, which has continued to this day. The core idea of this architecture is to store program instructions and data in the same memory, executed sequentially by a central processing unit (CPU). The birth of the CPU brought humanity into the era of general computing and made the rich and complex software of today possible. Without the CPU, we would not have the rich operating systems and software ecosystems we have today. Moore's Law has worked its magic in this context, allowing our software to be more abundant, faster, and stronger. In less than 100 years since the birth of the computer, humanity has increased computing power by a factor of a hundred trillion.
However, as the size of transistors gradually approaches the physical limit, Moore's Law has begun to slow down in the last decade, and we have gradually entered the "post-Moore era." General computing, centered on the traditional von Neumann architecture, is increasingly unable to meet the growing demand for computing. Humanity must make new disruptive innovations and breakthroughs to satisfy the unlimited demand for AI and computing power. In exploring the post-Moore era, the global semiconductor development route can be summarized in three directions: More Moore (extending Moore's Law vertically, improving the integration of semiconductor components, such as SoC chips), More than Moore (expanding Moore's Law horizontally, exploring the combination of new technologies or materials with integrated circuits, such as advanced SiP packaging), Beyond Moore/CMOS (beyond Moore's Law, exploring non-silicon-based technology paths, such as quantum computing).
Although the above three directions are currently developing and iterating in parallel, the global semiconductor iteration is mainly focused on "Continuing Moore's Law" and "Expanding Moore's Road" as the main directions, continuing to explore and discover more possibilities of silicon-based technology. Technologies within the "Beyond Moore's Law" category, such as quantum computing, photonic computing, neuromorphic computing, are still in the early stages of exploration due to their involvement in completely different technologies from the silicon-based path, and have not yet been commercialized on a large scale. Therefore, for the next 5 years or longer, the development of the global semiconductor industry will continue to cultivate horizontally and vertically in the post-Moore era. And the next generation of computing architecture will be built on such a foundation.
Computing is Heading Towards an Era of Super Heterogeneous, Integrated, and Parallel
Under the von Neumann computing architecture, computing platforms are often general (as opposed to accelerated), homogeneous (as opposed to heterogeneous), and rely mainly on a single type of processor (CPU) to handle computing tasks. Although simple heterogeneous computing platforms in the form of CPU+xPU (where x represents other types of chips, such as GPUs, DPUs, FPGAs, etc.) have appeared as early as the early 2000s, the performance of the platform is limited by the performance of the CPU. Under this architecture, the evolution speed of the CPU, the slowing of Moore's Law, and other limitations (such as chip cooling) are increasingly unable to meet the growing demand for computing power.
领英推荐
With the rise and popularization of AI technologies such as machine learning and simulation in the scientific community and high-end industries, parallel computing technology has been greatly developed, and HPCs (high-performance computers) built around GPUs have shown the broad space of super heterogeneous computing as the next generation of computing architecture. Indeed, the super heterogeneous computing architecture is not a simple CPU+xPU heterogeneous structure; it involves more types of accelerators/chips and more complex hardware integration. According to the definition put forward by Intel, super heterogeneous computing is an ecosystem that combines computing system architecture, chip manufacturing processes and packaging, and consistent and unified heterogeneous computing software.
Super heterogeneous computing is a super-integrated architecture for systems, hardware, and software. Depending on the actual application field and scenario, a super heterogeneous system may involve multiple types of accelerators to handle complex computing tasks. For example, NVIDIA's Thor, applied in the field of autonomous driving, is a system-level SoC chip that integrates CPU+GPU+DPU; Dell's Precision series of high-performance computers HPC integrates CPU+GPU+FGPA (Insert: Dell's stock price has more than doubled in the wave of AI large models). With the launch of more and more supercomputing power HPCs, our computing power has entered the Peta-scale phase.
Regarding the hardware development under the super heterogeneous computing architecture, the aforementioned "More Moore" and "More than Moore" two horizontal and vertical integrated circuit technology exploration routes are playing a parallel core role. For the vertical "More Moore" route, SoC system-level chips are an important technological upgrade, integrating heterogeneous semiconductor components (CPU/accelerators/memory, etc.) on the same chip, achieving smaller size, lower power consumption, and shorter communication delays. On the horizontal "More than Moore" route, chip manufacturers are currently focusing on breakthroughs in chip materials/processes/packaging. For example, MEMS micro-electromechanical chips are not limited by Moore's Law and usually integrate micro-mechanical components, sensors, actuators, and electronic circuits. In terms of performance improvement, SiP (system-level packaging) technology is a technological direction of "Expanding Moore", involving an important design concept of Chiplet chip packaging technology—through 2.5D and 3D methods, stacking and packaging chiplets representing different components horizontally or vertically to achieve more efficient and economical integration and interaction of different IP cores.
In addition to the heterogeneous integration of systems and hardware, the software ecosystem that supports heterogeneous computing is also a key part of the new generation of computing architecture. In fact, all performance optimizations on the hardware must ultimately be reflected at the software improvement level. NVIDIA not only designed the world's most powerful GPU chips but also built a vast software ecosystem with core technologies such as GPU hardware+CUDA software foundation+NVLink communication protocol. The CUDA programming language provides a large library of functions and APIs for NVIDIA's own GPUs, allowing programmers to efficiently develop GPU-based programs; the NVLink protocol supports bidirectional, high-bandwidth, low-latency memory access and data caching between CPUs and GPUs. These core technologies have enabled true computational acceleration for AI. And this vast ecosystem is why NVIDIA still holds an 80% global market share of standalone GPUs, even with many chip giants like Intel and AMD in pursuit.
Challenges and Opportunities for China's AI Ecosystem Players
In the face of the new generation of computing architecture, China's overall system, hardware, and software links are relatively weak. This is because compared to developed regions such as Europe, America, Japan, and South Korea, China started later in all aspects and has faced more and more technical blockades and restrictions in recent years. However, the data, resources, and technology needed for the development of AI large models are related to national security and must be held in our own hands. Therefore, exploring China's independent new generation of computing architecture ecosystem and finding one's own position is the challenge and opportunity for every domestic AI player.
In this AI revolution that scales to tens of trillions of dollars per year, Chinese entrepreneurs have never slowed down their diligent steps, with each team focusing on various segments and striving for breakthroughs. From the teams I have been in contact with, some are dedicated to developing domestic independent GPU chips and underlying software systems, hoping to carve out a path of independent control; some teams are stepping out of the territory dominated by Intel and AMD, focusing on chip development based on the RISC-V open-source instruction set; some teams are investing in the development of various cutting-edge compound materials, surpassing the technical limitations brought by silicon-based processes; some teams focus on advanced packaging technology, empowering performance-optimized, cost-effective accelerators; and some teams are aiming at the new generation of data center market based on HPC, providing efficient integration and utilization of heterogeneous computing resources ...
But the biggest challenge face by the Chinese companies is not only how to break through these technical barriers one by one, but also how to be compatible with each other to build a healthy and comprehensive computing ecosystem, like pieces of a puzzle finally coming together to form a complete map. This requires not only a long and arduous exploration process but also a long-term and open vision for entrepreneurial teams. Because the next NVIDIA is still NVIDIA, its story and closed-source ecosystem are hard to replicate in China. Yet many cases tell us that when an industry has already been monopolized by a closed-source ecosystem giant, opting for an open-source ecosystem is often the best competitive strategy. So, will the future of China's new generation of computing architecture be an open-source ecosystem to some extent? We shall wait and see.
Conclusion:
The new generation of computing architecture centered on super heterogeneous computing may already be an industry consensus that needs no debate. Against this backdrop, high-performance computers (HPC) equipped with GPUs and other accelerators are becoming the focus under the spotlight, becoming the core of the trillion-dollar AI infrastructure every year. For NVIDIA, the uncertain things in the future may just be production bottlenecks (including HBM capacity bottlenecks). For other AI manufacturers, especially China's AI players, the future uncertainty will be what kind of ecological niche they can manage to seize. The rolling wheels of the times do not allow anyone to slow down.
Writing here, I must admit that this article may have mentioned many confusing terms—post-Moore era, super heterogeneous computing architecture, non-von Neumann architecture, etc.—as if the complex technological changes make it difficult to clear one's thoughts. In fact, putting aside all the concepts and jargon, we should all seriously recognize and be aware that the AI technology revolution is happening right before our eyes, and its future direction will be closely related to every one of us. Paying attention to it, learning about it, or even embracing it might be the most significant decision affecting our lives.
Finally, I would like to quote a classic line from one of my favorite movies, "Toy Story," and Buzz Lightyear, to all the entrepreneurs and businesspeople on the front lines—"To Infinity and Beyond!"
End of the article
Article references and sources:
Building graph XAI products for smart enterprise.
7 个月I like to point out 2 things, hopefully not too perplexing for technologists to digest: 1. As Moore's Law or any so-called laws or hypothesis lose their shines -- the ultimate evolution or breakthrough to happen is not tech related, it's always consciousness evolution that matters -- some say this is God or ET-inspired evolution, let it be. 2. The comparison between Von Neumann and Neuromorphic architecture may lead to problematic conclusions. First, Von Neumann architecture can be highly parallel, and great for general-purpose computing, even though younger generations living in the single-threaded Python era seem to forget how HPC is tied with this arch... 2ndly, GPU-centric computing architecture has a few lingering problems, which I summarized as the XAI problem, namely, Explainability, Accuracy, and Cost. Look at today's AI, how much of OpenAI's GPT or any LLM can be explained in a white-box way? How accurate a result in the midst of sustained hallucinations? and How expensive it is to get training and sampling done? Worst of all, other than NVDA, is there any AI company making money at all? I guess my point here is that GPU-thirsty AI companies are rootless without NVDA, this is never a good sign.