The OCP Rack Architecture of the GH200 is Pretty Neat (at least to a HW nerd like myself)

The OCP Rack Architecture of the GH200 is Pretty Neat (at least to a HW nerd like myself)

The NVIDIA GH200 NVL32 rack-scale reference architecture is designed to cater to 16 dual-server nodes compatible with the NVIDIA MGX chassis.

Central to the GH200 Grace Hopper Superchip's innovation is the NVLink-C2C interface. This feature establishes an NVLink addressable memory space, significantly streamlining model programming. Integrating high-bandwidth, low-power LPDDR5X, and HBM3e memory, combined with NVIDIA's GPU acceleration and high-performance Arm cores, creates a powerful and balanced system.

The connectivity framework of the GH200 server nodes employs an NVLink passive copper cable cartridge, allowing seamless access to a remarkable 19.5 TB of memory across the network. This setup ensures that each Hopper GPU can access 32 x 624 GB of NVLink addressable memory. The NVLink Switch System has been upgraded to incorporate NVLink copper interconnects, connecting 32 GH200 GPUs with nine NVLink switches that include third-generation NVSwitch chips. This creates a fully connected fat-tree network architecturer. For expanding computational needs, the system is scalable using 400 Gb/s InfiniBand or Ethernet connections, merging exceptional performance with energy efficiency.

NVIDIA is developing its own DGX GH200-based AI supercomputer, named NVIDIA Helios, to power its research and development efforts. Helios will consist of four DGX GH200 systems, interconnected with NVIDIA Quantum-2 InfiniBand networking, to supercharge data throughput for training large AI models. This setup includes 1,024 Grace Hopper Superchips.

Very neat!

?

Javier Martin

Strategic Accounts - Acquisition Team

1 年

Tony, without a doubt, the next generation of data centers will require a much deeper thought and engineering process to absorb the massive increase in power density and rack liquid cooling requirements. Data Center of the Future - who has the power?

Mike Mann

Liquid Cooling Consulting Services

1 年

The NVIDIA GPUs represent the first commitment at scale by the industry to bring liquid cooled servers to market. Liquid cooling is inherently more efficient than air so win-win for the power grid.

Rex Stock

Seeking Planet Friendly Solutions

1 年

#OpenCompute! Yes Tony Grayson

John Wallerich

Hyperscale Data Center Infrastructure Specialist, Strategist, Energy Efficiency & Sustainability Leader, with 40+ years in tech Researcher/Inventor/Fellow/Advisor

1 年

Thanks for this, Tony. It's amazing to see how far Nvidia has come. I worked with them 20 years ago deploying HPC racks that used Infiniband and MPI to create huge virtual blocks of memory required to optimize EDA app run times. It was a brilliant move, one of many, to acquire Mellanox. But to tie together GPU's into dynamically reconfigurable GPU clusters is a game changer. Well done, Nvidia!

??Greg Crumpton??

HVAC for Life | Writer | Mentor | Skilled Trades Zealot | Dot Connector

1 年

Tony, thanks for the #GraceHopper info yesterday!

要查看或添加评论,请登录

Tony Grayson的更多文章

社区洞察

其他会员也浏览了