And now the GB200!

And now the GB200!

Yesterday, I discussed the new B100 and B200 Blackwell Chips. Today, I wanted to delve into the GB200 Superchip.

At the heart of the NVIDIA GB200 NVL72 is the NVIDIA GB200 Grace Blackwell Superchip. This design links two NVIDIA Blackwell Tensor Core GPUs and an NVIDIA Grace CPU through the NVLink-Chip-to-Chip (C2C) interface. This setup enables a large 900 GB/s bidirectional bandwidth. With the NVLink-C2C, applications can easily access shared memory space, streamlining programming and accommodating the large memory requirements of advanced AI models and simulations.

The GB200's computing power is housed in a tray designed around NVIDIA's latest MGX blueprint. This setup includes two Grace CPUs and four Blackwell GPUs, equipped with features like cold plates for liquid cooling, PCIe gen 6 for swift networking, and NVLink connectors. The GB200 compute tray can perform 80 petaflops of AI and has 1.7 TB of rapid memory.

To make the magic happen, numerous Blackwell GPUs must operate efficiently in tandem, necessitating high bandwidth and minimal latency for constant activity. The GB200 NVL72 system enhances parallel model performance across 18 compute nodes through the NVIDIA NVLink Switch System. This system connects GPUs and switches with nine NVLink switch trays and cable cartridges.

The GB200 supports configurations with 36 and 72 GPUs in NVLink domains. Depending on the setup, a rack can host 18 compute nodes following the MGX design and incorporate the NVLink Switch System. These configurations range from the GB200 NVL36, with 36 GPUs, to the GB200 NVL72, housing up to 72 GPUs for unparalleled computational density and efficiency.

The introduction of fifth-generation NVLink in the GB200 NVL72 is a significant leap in GPU-to-GPU communication. This technology can connect up to 576 GPUs within a single domain, offering over 1 PB/s total bandwidth and 240 TB of fast memory. Each NVLink switch tray in this setup provides 144 NVLink ports at 100 GB, fully integrating each of the 72 Blackwell GPUs for seamless, high-speed data transfer.

Some Stats:

AI training:

The GB200 includes a faster second-generation transformer engine featuring FP8 precision. It delivers 4X faster training performance with 32k GB200 NVL72 for large language models like GPT-MoE-1.8T compared to the same number of NVIDIA H100 GPUs.

AI inference:

It achieves a 30x acceleration for demanding applications such as the 1.8T GPT-MoE compared to the H100 generation. The basis for this 30x speedup is a comparative analysis between two setups: one utilizing 64 NVIDIA Hopper GPUs, which are interconnected through an 8-way NVLink and InfiniBand, and the other comprising 32 Blackwell GPUs integrated into the GB200's NVL72 framework, specifically for running the GPT-MoE-1.8T model.

Looks like fun times ahead!

Garrett Johnson

Co-Founder/COO at Hydra Host & Co-Founder/Chairman at Foundation for American Innovation

7 个月

Hydra Host just secured B100s for preorder. We were so impressed at GTC that we had to have them.

Rick P.

Information Technology

8 个月

Thanks for the nes info!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了