Power, Heat, Space, and the Move to Double-Wide SmartNICs
Scott Schweitzer, CISSP
Positioning Achronix FPGAs as 400GbE DPU Leaders
Every electron flowing through an ASIC at the heart of any SmartNIC produces an equal amount of heat.?
PCIe Power
In 2005 as new servers began to appear with first-generation PCI Express, we learned that this new I/O slot standard offered 10W of power at 3.3v and up to 25W of power at 12v (35W total) for eight lanes of data transfer to a single device. Later, sixteen-lane slots became popular, offering up to 75W of power before requiring an external connector. Through the fourth generation, there were two optional power connectors, a 6-pin 75W or an 8-pin 150W. Both supplied additional power at 12v, bringing the capability of the card to consume power to 150W and 225W, respectively.?
As GPUs began utilizing these optional power connectors, they were forced to move from a single-wide PCIe slot to a double-wide slot configuration to contain the required heatsink. Often, fans actively dissipated this heat. The new PCIe fifth-generation standard adds a 12+4-pin power plug carrying a whopping 600W (actually, it’s 12-pins for power and 4-pins for a tiny side-band connector).
??
Heat Sinks
Heat sinks fall into two main categories, passive and active, with active heatsinks having a built-in fan and passive, requiring the system’s natural airflow to draw away heat. In non-server-based environments, like workstations or gaming systems, active heat sinks or heat plates for custom liquid cooling are very common with high-end GPUs. Active heat sinks are frowned upon in servers as the airflow through these systems, especially 1U and 2U servers, is very tightly managed. Today air through servers is measured in linear feet per minute, with airflows ranging from 300 to 750 LFM on popular servers.???
Heat sinks are often made of aluminum or copper, with copper being about 70% more efficient but also often costing 1.8X more. Also, heat sinks could be a simple plate containing fins, heat pipes, or even vapor chambers.?
Plate-based heat sinks dissipate heat through direct contact, rarely with the system chassis, and more often are used with active liquid cooling solutions that are typically 20X more effective than air. Fin-based heat sinks increase the surface area of the heat sink to radiate heat into the surrounding air and are the most common method. Fins are often combined with heat pipes that contain a small amount of a liquid, typically water, in a closed circuit to move heat quickly from one place to another via evaporation and condensation. These are often seen on CPU tower-based coolers. Finally, the newest design is a variation on the heat pipe called the vapor chamber, where we have what amounts to a copper sponge contained within an enclosed chamber that has a droplet or two of water and functions similar to a heat pipe but in a much more compact form factor using evaporation to dissipate heat quickly and evenly across the device.?
领英推荐
Space
To determine how much heat can dissipate, we need to look at several variables: the temperature of the input air, the maximum junction temperature we want for our ASIC device, the thermal resistance of the heatsink, and airflow over the heatsink. The thermal resistance of the heatsink includes its complete architecture, material, surface area over the board, volume, fin dimensions, and quantity, and if it incorporates heat pipes or vapor chambers.?
PCIe slots come in half, three-quarter, full-length, low-profile, full-height, and single versus double-wide. Server vendors, for the most part, have constrained SmartNIC devices up to this point to be placed into a slot with a maximum size of three-quarter length, full height, and single wide. So suppose we were to build a SmartNIC with a single QSFP-DD cage (typically used for 400GbE) and constrain the board to be a full-height three-quarter length single-wide card with cutouts for the QSFP-DD and a power connector. Then we pulled out all the stops, used as much board real estate as possible, and threw cost to the wind, so we’re making it from copper and utilizing the latest architecture. What are the limits?
Most server GPU slots receive data center temperature air, but SmartNICs are often relegated to seeing preheated air that has already flowed over memory or CPUs. So we must assume that this ambient air could, at worst, be 50C. If we constrain the junction temperature of our ASIC device to a reasonable 95C, we’d need 600 LFM of airflow to cool a 135W board. The downside is that this SmartNIC slot often doesn’t see this much air. A Dell R750 GPU slot can expect around 700 LFM of air, while the SmartNIC slots might only see half that. So pushing actual SmartNIC power consumption much beyond 75W in these traditional slots could limit the lifecycle of the SmartNIC card.?
This is confirmed by looking at NVIDIA’s BlueField-2 DPU at 2x100 GbE (BF2M516A). We can see that this full-height, half-length, single-wide card was laid out to include a 6-pin power cable potentially, but it got by with just the power from the PCIe x16 bus, which is limited to 75W. As seen in the attached picture the heat sink uses a pair of heat pipes to transit the heat from the bottom to the top of the card. Also, NVIDIA left an open-air channel so the QSFP slots could receive some “fresh” air. These QSFP connectors can house active optical cables that require up to 10W each, hence the heatsinks on top of each connector.?
Future
When ServeTheHome displayed pictures of NVIDIA’s latest BlueField-3 2x200GbE (Model D3B6) board in December of 2022, we saw how power is starting to reshape the space requirements for DPUs and SmartNICs. As seen below, this card is full-height, half-length, but most surprisingly, it's a double-wide board. The two QSFPs receive uninterrupted airflow while the BlueField-3 chip is cooled entirely from the double-high heat sink fins on the top of the board. A plate with potential heat pipes moves heat from the bottom half of the BF-3 ASIC to the heat sink fins. The end view of the board shows an 8-pin power cable so that this board could consume up to 225W. Here is the rub, though, because the board is double-wide, it must be moved into a GPU slot where it will likely receive data center temperature air and peak airflow volumes. The BlueField-3 will hopefully reshape where SmartNICs & DPUs are placed in future servers. Unfortunately, the ship may have sailed for the Sapphire Rapids server designs that have just begun shipping.
Business Systems @ Owl Labs
11 个月What operating systems can these run? Is the IPMI standard?
产品经理
1 年This board is similar to one Nvidia's project, we provide QSFP28 cage