Evolution of Data Center Networking Designs and Systems for AI Infrastructure – Part 4 (Final)
In parts 1, 2 and 3 of this article, I covered changes in data center network designs as a result of modern AI training and inference applications.? We focused on the scale-out and scale-up portions of the backend network that connects off-the-shelf GPUs from vendors like NVIDIA and AMD. In part 3, I expressed my observations on standardization efforts related to the AI backend network and requirements for success, highlighting the need to solve ecosystem challenges. In this final part of this series which I hope you have found useful, I present my concluding remarks building on multiple food for thought questions I had raised in the previous articles.
A Paradigm Shift
I read an old article about the role of PCIe within the server and Ethernet as a fabric outside the server.? The PCIe lanes within the server was compared to aisles within a large store where consumers with shopping carts moved goods at ease. The aisles are pathways dedicated for shoppers and shopping carts only, enabling moving of goods with no congestion.? When moving the same goods to the consumers’ homes, cars or trucks on shared roads and highways are used, akin to Ethernet.? These roads and highways are used by anyone, not just the shoppers, and therefore prone to congestion.? In the world of networking for AI, it must be obvious that the set of GPUs that reside within the AI server has their own dedicated paths for data movement within the server (the scale-up network comprising PCIe or the GPU fabric) and outside (the scale-out network comprising Ethernet or InfiniBand).? This is a significant paradigm shift that mandates that we think differently.
Recap and Concluding Remarks
In this three part series, I have tried to highlight how data center networking for AI is different than what we have experienced in the past.? In the process, we gathered key observations and food for thought on key aspects that require us to look at solving technology and business challenges differently.? As part of concluding this series, let me take the opportunity to recap those nine intriguing observations below:
Observation #1:
The NIC used in the backend network is dedicated for GPU data movement.? Lately, terms like SuperNIC or AI NIC have been used to define this category.?
- Food for thought: Will the SuperNIC or AI NIC as a new server networking product category become a high growth segment in the coming years?
- My take: A resounding yes. Volume/revenue growth related to backend scale-out networks is being driven by real and present challenges in AI networks.
Observation #2:
Servers used for AI training contain many NIC and PCIe switch-related systems and silicon that move GPU traffic destined for the network fabric.? Often, there are rail switches that have 1-1 correspondence to GPUs, besides the NICs and PCIe switches.?
- Food for thought: There are up to 3 networking silicon components serving each the 8 GPUs in a a typical AI server – NIC, PCIe Switch, rail switch.? Do we expect such functions to get subsumed into the same silicon?
- My take: From performance and operational efficiency standpoints, absolutely yes.?Many NIC silicon solutions already include PCIe switches. Server designs are evolving rapidly to address changes in GPU configurations. There is opportunity for the networking portion of those server designs to evolve to keep up with and complement the GPU compute part.
Observation #3:
Use of collective communication libraries (xCCL) pre-plan and orchestrate GPU-to-GPU data movement across scale-up and scale-out networks to avoid congestion.
- Food for thought: Given the tight dependency of xCCL in servers with traffic patterns that flow through switches, should we expect tighter coupling between backend NICs and switches for improved congestion management?
- My take: Trends like use of spray and rail switches for congestion avoidance are relevant.?I expect tighter coupling to help reduce xCCL churn/complexity challenges.
Observation #4:
PCIe and CXL are ideal standards-based technologies that could replace proprietary GPU fabrics in scale-up networks.?
- Food for thought: Will the industry help push these technologies to faster rates of innovation to close the significant performance and capability gaps that exist today versus proprietary GPU fabrics?
- My take: Likely not in the foreseeable future.? Similar to how dedicated transports like RoCE and UET (Ultra Ethernet Transport) are applied (vs TCP), a legacy-free and dedicated version of PCIe will be needed.? CXL builds on PCIe, so the same logic applies.
Observation #5:
GPU fabrics are evolving to become high scale switching solutions supporting larger clusters of GPUs.? For example, NVIDIA NVL72 can connect up to 72 GPUs.? The scale has grown from 8 to 72 within a short span.
领英推è
- Food for thought: Could these GPU fabrics encroach significantly into the turf that currently belongs to scale-out networks where UEC enhancements are being targeted?
- My take: No.? They each have their respective roles.? With increasing size of GPU clusters and denser racks, the boundaries will likely become fuzzier. Better scale-out network implementations can reduce the pressure on GPU fabrics to span to large clusters.
Observation #6:
Proprietary GPU fabric technologies are being applied for chip-to-chip (C2C) connectivity between GPU, CPU and NIC silicon used for backend networking.? This trend is being observed at GPU vendors like AMD and NVIDIA that have their own GPUs, CPUs and NICs
- Food for thought: Applying their proprietary and proven C2C methods to build chiplets seems like a natural path for the GPU vendors.? Could this hamper adoption of standards like Universal Chiplet Interconnect Express (UCIe) for connecting multi-vendor chiplets?
- My take: The answer to this question relates to if the hyperscalers building their own AI accelerators chose a different path than the off-the-shelf GPU vendors like AMD and NVIDIA.? They may but only if a choice of NICs and CPUs to integrate with as chiplets makes sense.? Currently, they seem to be moving toward building their own NICs and CPUs to complement their AI accelerators.? That makes the case for UCIe adoption harder.
Observation #7:
Because both networks (scale-up and scale-out) are responsible for moving data between GPUs, the AI software stack needs to be aware of the capabilities of both and make optimum use of each.?
- Food for thought: With more performance optimizations implemented across the stack spanning both networks, could procurement choice for the two networks (such as buying equipment from different vendors) become increasingly challenging?
- My take: For example, CCL accelerations such as for All Reduce available today only on NVLink and InfiniBand (both available from NVIDIA only) can make procurement choice difficult. CCL distributions are also performance-optimized for network design recipes offered by GPU vendors.? When a broader set of solution suppliers are able to innovate at the CCL layer and support such performance enhancements, procurement choice should improve.
Observation #8:
Historically, innovations in how packets are routed and switched in the networking infrastructure have helped Ethernet remain dominant versus competing technologies.?
- Food for thought: Will the UEC’s efforts to advance Ethernet technology for AI scale-out networking need a more ecosystem-focused approach??
- My take: Yes.? Powerful incumbents will continue to extend the ecosystem capabilities for current technologies and solutions. UEC’s Ethernet and transport-related specifications and related products by vendors will have to be supplemented with significant ecosystem-related enhancements for a compelling whole product experience.?The whole product barrier set by NVIDIA, for example, is getting higher with each new generation of its hardware and software products related to AI.
Observation #9:
Successful incumbents typically continue to extend the ecosystem capabilities for current technologies and solutions using which they enjoy a large market share.? This can create deployment hurdles for challengers that utilize new technologies and standards.
- Food for thought: NVIDIA, as the gorilla incumbent in this space, has been innovating at breakneck speed, leading the industry with new features and products, driving the ecosystem to move rapidly with it.? Will the company take a more prominent role to improve Ethernet for AI?
- My take: Currently, for AI networking, NVIDIA promotes its InfiniBand-based products as the best solution followed by Spectrum-X for Ethernet.? The latter encompasses the company’s initiative to improve Ethernet for AI.? Like we have seen with all large incumbents in the industry, NVIDIA will likely continue to stretch the limits of both technologies and products until its market share begins to get adversely impacted in a significant way. Irrespective of its business-driven imperatives, I hope the company will soon participate in the UEC and make valuable contributions.
My Next Step and Journey
As a next step, I am embarking on an exciting journey as VP of Product with my esteemed colleagues at Enfabrica [1].? The intent is to get into the trenches with a very talented team, collaborate with customers with deep know-how in this space and solve the above challenges and more for the good of the industry and the world.? The use of the word “world†is a tall order; it is in fact for all of us working in the field of AI.? Because AI, by boosting productivity of enterprises, promises to significantly uplift the GDP of third world countries at a pace never seen before, improving the quality of lives of millions of people [2].
References:
[1] Enfabrica
?
Independent Technology Analyst
10 个月Totally agree with your takes, Sujal Das. Standards bodies and consortia need to come to grips with NVIDIA's dominance. Some also need to pivot from a CPU-centric focus to GPUs and accelerators as equal citizens.
Technologist & Product Manager
10 个月Sujal Das - loved your observations and analysis. Any specific ideas around "ecosystem-related enhancements" for UEC adoption?
Sujal Das congrats!! ?? .. must read with valuable insights!