登录查看更多内容

SuperNIC Explained? Part 2

Scott Schweitzer, CISSP

Positioning Achronix FPGAs as 400GbE DPU Leaders

发布日期: 2024年7月22日

Earlier this summer, in Part 1, I speculated on NVIDIA's definition of a SuperNIC. On Friday, I received an email newsletter from NVIDIA Networking pointing me to a November blog post by Itay Ozery titled "What is a SuperNIC?" I'm not sure how this was missed while researching that piece, and I'm sorry.

"Get to Know the SuperNIC" email by NVIDIA on July 19, 2024

NVIDIA defines a SuperNIC as different from a DPU in the following ways six ways:

1. 400 Gbps using RDMA over RoCE

AI is simply another High-Performance Computing (HPC) workload. The only differences are the typical packet payload sizes and how latency-sensitive these applications are. For the past three decades, we've been tuning HPC networking fabrics to squeeze out every bit of networking latency while improving overall throughput and performance. The design for the car you drove to work this morning was likely crashed hundreds of times in HPC simulations to improve your overall safety and comfort. Two decades ago, Ford and GM each ran several HPC clusters that performed finite particle analysis for this purpose. At the same time, Boeing and Catapillar had Computational Fluid Dynamics executing on their HPC clusters doing wing and engine design. RDMA is well over two decades old, and RoCE, while newer, was around before Taylor Swift released her Red album (2012) for the first time. The only thing even remotely new is the speed, and two years ago, NVIDIA revealed these capabilities at this speed in their ConnectX-7 at ISC-2022.

2. High-speed packet reordering

High-speed packet reordering has been an HPC requirement since the beginning. Myrinet-2000, the leading HPC networking fabric in 2003, was doing packet reordering long before Infiniband came onto the scene a few years later. Casting this as something "new" requires something new, and what that is isn't obvious at this point.

3. Advanced congestion control using real-time telemetry data

Again, this has been around in one form or another in HPC networking since at least Myrinet-2000. Back then, every NIC in the HPC network knew about every other NIC and all the possible paths between them. Back then, one NIC was dynamically elected the master in the fabric, and it then updated the node tables in all the other NICs. The NICs then each used this real-time data to route packets between compute nodes over one or more multiple paths simultaneously for a single message.

Fast Company 8 个月前

The AI Success Mantra

AIM 1 年前

GP Bullhound's weekly review of the latest news in…

GP Bullhound 1 年前

4. Programmable computing on the I/O path

This could be the single best "new" feature, but the description given by NVIDIA is nebulous, "to enable customization and extensibility of network infrastructure in AI cloud data centers." If the SuperNIC is doing some programmable transformation on the data to improve overall performance and throughput, it would be worth earning the monicker SuperNIC.

5. Power-efficient design to meet AI power budgets

All semiconductor companies strive to limit the power our chips consume, but the fact remains that more transistors typically translate to greater power consumption. Back in the day, standard NICs drew 15W, and SmartNICs drew anywhere from 35 to 75W, which was the power limit of the gold fingers of the PCIe x16 connector. More recently, DPUs have required a supplemental VGA 6-pin power supply connector providing an additional 75W. The "SuperNIC" version of the BlueField-3 has an optional VGA 8-pin power connector capable of supplying up to 150W of additional power. As stated in the "NVIDIA BlueField-3 DPU Controller User Manual" the SuperNIC version has a "... maximum power consumption does not exceed 150W."

6. Full-stack AI optimization

This defining feature, as outlined here: "Full-stack AI optimization, including compute, networking, storage, system software, communication libraries, and application frameworks," lacks detail. The rest of the blog post doesn't provide specific optimization details or examples.

Given what has been published thus far, it appears that a SuperNIC may just be a DPU leveraging RoCE.

SmartNICs Today

880 位关注者

Josh Saul

4 个月

Functionally, all of these things (SuperNICs, SmartNICS, DPUs) are the same thing. It’s all about the actual solutions that are being delivered with this architecture. So if Nvidia delivers a DPU solution that cuts CPU/GPU utilization by 50%, everyone will be calling their solution a GPU. If AMD/Pensando squashes IO calls by 50%, people will ask to buy SmartNICs. Etc.

2 次回应

Aldrin Isaac

Director, Site Network Engineering at eBay

4 个月

SmartNICs don't always make sense at scale. They add cost, including recurring power costs. Features like high speed packet reordering should not require a SmartNIC. I don't see any reason why high speed packet reordering for RDMA can't be implemented in a standard NIC, other than as an excuse to force customers to pay more for a fancy SmartNIC. All that is needed is standard NICs with high speed packet reordering support for RDMA over a simple packet spray fat-tree fabric that can fast-fail on link issues. In such a case, advanced real-time telemetry is only needed for diagnostics, not for influencing the data path characteristics.

11 次回应

Aidan Herbert

Decentralized transactional ecosystem enabler

4 个月

It's odd that more organizations do not use FPGA-enabled SmartNICs for line speed packet filtering. So much more efficient tha=n shipping all packets through Cloudflare for DDoS protection!

2 次回应

Bruce Tolley

4 个月

Great content. Excellent analysis!

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

SuperNIC Explained? Part 2

Scott Schweitzer, CISSP

Positioning Achronix FPGAs as 400GbE DPU Leaders

1. 400 Gbps using RDMA over RoCE

2. High-speed packet reordering

3. Advanced congestion control using real-time telemetry data

领英推荐

4. Programmable computing on the I/O path

5. Power-efficient design to meet AI power budgets

6. Full-stack AI optimization

SmartNICs Today

880 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

GP Bullhound's weekly review of the latest news in public markets.

GP Bullhound's weekly review of the latest news on the public market.

The AI Success Mantra

Fueling AI's Exponential Growth: The Infrastructure Challenge of Our Time

DDN Expands Support for NVIDIA Technology to Enable AI Application Acceleration for Data Center Infrastructure

End of March '24 Newsletter

Comparing AI Accelerators - It is not as Cut and Dry as You Think

When Gaudi Met Llama: Intel's Love Story with AI

??? 3 New Groundbreaking AI Chips Explained

When AI Giants Compete, Consumers Win

1. 400 Gbps using RDMA over RoCE

2. High-speed packet reordering

3. Advanced congestion control using real-time telemetry data

领英推荐

4. Programmable computing on the I/O path

5. Power-efficient design to meet AI power budgets

6. Full-stack AI optimization

SmartNICs Today

880 位关注者

SuperNIC Explained? Part 1

2024年5月30日

SmartNIC = (DPU, IPU, NPU)

2023年7月2日

DPUs in ToR Switches

2023年6月25日

Top Ten DPU Features in 2028

2023年6月19日

GFTs, Hyperscaler Magic Pixie Dust

2023年6月5日

GFT, the Smart in SmartNIC

2023年5月17日

What Makes SmartNICs "Smart"

2023年3月12日

Will 100GbE Dominate Thru 2024?

2023年3月6日

A Server Designed for 2x200GbE!

2023年2月20日

Power, Heat, Space, and the Move to Double-Wide SmartNICs

2023年2月12日

社区洞察

其他会员也浏览了

GP Bullhound's weekly review of the latest news in public markets.

GP Bullhound's weekly review of the latest news on the public market.

The AI Success Mantra

Fueling AI's Exponential Growth: The Infrastructure Challenge of Our Time

DDN Expands Support for NVIDIA Technology to Enable AI Application Acceleration for Data Center Infrastructure

End of March '24 Newsletter

Comparing AI Accelerators - It is not as Cut and Dry as You Think

When Gaudi Met Llama: Intel's Love Story with AI

??? 3 New Groundbreaking AI Chips Explained

When AI Giants Compete, Consumers Win