OCP Global Summit 2024 Series
For the final piece of the Global Summit wrap up, I focus on Networking, both inside the server and between racks, and the crystal ball state of AI.
Networking
In the area of networking, the conference shed light on some fantastic initiatives that are set to redefine how we connect and scale AI systems. These are set to give users viable alternatives and real choice, providing some well needed competition in the space, and reduce reliance and supply-chain risks.
SCALE UP with UALink (Ultra Accelerator Link)
As an alternative to NVLink, UALink focuses on scaling up by connecting GPUs together to form a more powerful, unified GPU. The UALink Consortium has the who's who of hyperscalers ( Meta , Amazon Web Services (AWS) , 微软 ), silicon providers ( 英特尔 , AMD ) and networking ( 思科 , 诺基亚 ) plus many others, collaborating together to provide a solution for interconnecting non-NVIDIA chips (like AMD's MI300X, or Intel's Gaudi 3) in a single node, at high speed and low latency, to create a large logical processor, sharing resources (critically, memory) to host GenAI workloads like LLMs.
Essentially, make the biggest GPU possible.
SCALE OUT with UEC (Ultra Ethernet Consortium)
One of the key highlights was the progress made on products from the Ultra Ethernet Consortium (UEC). The UEC is spearheading efforts to develop next-generation Ethernet technologies tailored for AI workloads. Not unsurprisingly, the same folks interested in solving for connecting accelerators (GPUs) together, are many of the same ones that want to connect as many large GPUs together as possible.
Products based on UEC are promising InfiniBand-like performance and features, using traditional Ethernet tooling.
For a little more detail on Ultra Ethernet, be sure to watch this;
领英推荐
What was also great, was seeing AMD’s UEC NICs. AMD presented what appears to be their first UEC-compatible NIC—the AMD Pensando Pollara 400GbE card. While the name might be a mouthful, it’s exciting to see AMD sampling this card in Q4, with general availability expected around Q2 of 2025. This aligns with AMD’s acquisition of Pensando and their push into advanced networking solutions.
Patrick Kennedy at the ServeTheHome team have fantastic coverage on their website https://www.servethehome.com/amd-pensando-pollara-400-ultraethernet-rdma-nic-launched/
To explore both Ultra's (UALink and UEC) at a high-level, check this helpful presentation out;
Other Tidbits
? Intel’s New IPU (Infrastructure Processing Unit): Intel introduced a new NIC, which they refer to as an IPU—essentially their take on a DPU (Data Processing Unit). This move signifies Intel’s commitment to offloading and accelerating network functions, improving data center efficiency.
? Dell’s ORv3 Rack with Liquid-Cooled NVLink: Dell showcased their XE9712, an ORv3 rack equipped with liquid-cooled NVLink (specifically the NVIDIA NVL72). This setup supports up to a staggering 180 kW per rack, highlighting the intense power and cooling requirements of modern AI infrastructure.
I talk about this briefly in my OCP podcast recording with Rob Coyle , so stay tuned for that one!
AI's Shift Towards Inference
A significant theme that caught my attention was the industry’s pivot towards AI inference. For years, I’ve been emphasizing the impending dominance of inference workloads over training. Training massive AI models has been the focus, with organizations scrambling to build solutions and supply chains to meet these demands. However, it’s now widely accepted that inference—the deployment and utilization of these trained models—is the long tail of AI, and Enterprise AI utilization, really, has not started (yet).
This shift is prompting organizations to prepare for the unique challenges and opportunities that inference presents. From optimizing hardware for lower latency and higher throughput to rethinking data center designs for efficiency, the focus is broadening. It’s an exciting time as the industry adjusts to balance both training and inference workloads effectively.
Thanks again to the Open Compute Project Foundation team, and all the contributing members for moving the needle, sharing their work and partnering in a collaborative way. It's a great, and thriving community, and a credit to all involved who dedicate so much time, blood, sweat and tears.
Look forward to 2025!