NVIDIA GTC Impressions - A Data Center Perspective: "It was my Understanding that there Would be no Math"
NVIDIA GTC Impressions - A Data Center Perspective: "It was my Understanding that there Would be no Math"
There are plenty of thought pieces being written about last week’s iconic? NVIDIA GTC Conference, but few if any come from the perspective of the data centers.?
To start with, a quick recap of the new GPU specifications - aka “The Math”.?
NVIDIA is continuing their evolution from the A100 (last generation) to the H100 (current generation) to the next-gen GB200, their GPU system which will combine two B200 Blackwell GPUs with one Grace Hopper CPU, to provide four times the performance of the last generation at an approximately 30% power density increase and in the same space. On a power basis, that’s about a 310% performance:power ratio increase.
A single rack would have 36 Hopper CPUs and 72 Blackwell GPUs in the standard configuration and would take 120kW/cabinet, although whispers indicate that in operation it may be a bit less - call it 110kW/cabinet. Within the rack, copper is used as the networking medium - a pretty standard approach considering that optics are unnecessary and would add an additional 20kW of power.?
Eight racks form a row - 288 Hoppers and 576 Blackwells, connected by 576 NVLink Smart NICs in a fully meshed Infiniband configuration. That sounds good, except that lead times on Mellanox NICs are increasing and currently stand at 10 to 20 weeks - the critical path in any NVIDIA GPU deployment. Each eight rack row would be supported by one Cooling Distribution Unit (CDU), connected to building chilled water.?
32 racks for a standard sized deployment pod of 4MW. That being said, I’m skeptical that 4MW is a true unit of deployment - I suspect four of these pods will be deployed together, making the minimum deployment size 16MW. In comparison, the last generation H100 was deployed in an 8000 GPU 15MW pod. Any way you cut it, this is 4x as good per GPU and 3.1x as good per unit Power.?
So, what does this do to power projections for GPU hosting data centers? Honestly, not much. The increase in both power and performance was anticipated in my analysis and we are holding steady on 31,000MW of global GPU data center demand in 2028. That is a highly conservative number - our friends at Semi Analysis have published a much higher projection of 80,000MW, a power figure that we feel can not be met with either likely chip shipments or utility power.?
领英推荐
NVIDIA is also shipping a slightly “toned down” B100-powered drop in replacement for H100 racks, each GPU clocking in at 700kW instead of 1000kW. NVIDIA also pushed their Spectrum-X Ethernet network as a possible alternative to the Mellanox Infiniband. Plenty of GPU hosters, however, are wondering why not just deploy 400G Ethernet with less expensive Smart NICs. It’s a good question to ask, in my opinion.?
So, what’s the data center impact of these titanic announcements?
What does all of this mean in regard to depreciation life of GPUs? A100s had (and have) a clear six year depreciation life, which has been pretty great for the GPU hosters and others. Will H100s also have that long life? That really depends on how many of the more modern blackwell systems than NVIDIA can ship and how quickly. But six years seems unlikely. Many GPU hosters are planning for two years of lifetime. The economics get a lot better at four years, but this would require the ability to shift older GPUs from remote training locations to closer-in inference locations. This is not easy and it's far outside the wheelhouse of the vast majority of those in this sector. No one has seriously tried lift-and-shift for a decade or more, because it's horribly difficult and expensive. Perhaps it's time to redevelop this capability.?
Finally, if you are in the data center business, you should be at GTC. Some very influential data center executives were there, but I was surprised to see how many engineers were not. This isn’t optional anymore - it's not just an interesting application. ML is the future for data centers. Next year’s GTC will supposedly be in Las Vegas due to the size of the crowds. I hope to see you there - it's vital for your business.?
Thank you Dan
Great article and was great to meet you there! Hope to catch up again soon!
Tremendous, as this reader has come to expect of your erudite musings
SVP Amentum Digital Infrastructure
11 个月See u in Vegas!
Director, Edge Network Infrastructure
11 个月Great readout