The OCP Rack Architecture of the GH200 is Pretty Neat (at least to a HW nerd like myself)
Tony Grayson
VADM Stockdale Leadership Award Recipient | Tech and Business CxO | Ex-Submarine Captain | Top 10 Datacenter Influencer | Veteran Advocate
The NVIDIA GH200 NVL32 rack-scale reference architecture is designed to cater to 16 dual-server nodes compatible with the NVIDIA MGX chassis.
Central to the GH200 Grace Hopper Superchip's innovation is the NVLink-C2C interface. This feature establishes an NVLink addressable memory space, significantly streamlining model programming. Integrating high-bandwidth, low-power LPDDR5X, and HBM3e memory, combined with NVIDIA's GPU acceleration and high-performance Arm cores, creates a powerful and balanced system.
The connectivity framework of the GH200 server nodes employs an NVLink passive copper cable cartridge, allowing seamless access to a remarkable 19.5 TB of memory across the network. This setup ensures that each Hopper GPU can access 32 x 624 GB of NVLink addressable memory. The NVLink Switch System has been upgraded to incorporate NVLink copper interconnects, connecting 32 GH200 GPUs with nine NVLink switches that include third-generation NVSwitch chips. This creates a fully connected fat-tree network architecturer. For expanding computational needs, the system is scalable using 400 Gb/s InfiniBand or Ethernet connections, merging exceptional performance with energy efficiency.
NVIDIA is developing its own DGX GH200-based AI supercomputer, named NVIDIA Helios, to power its research and development efforts. Helios will consist of four DGX GH200 systems, interconnected with NVIDIA Quantum-2 InfiniBand networking, to supercharge data throughput for training large AI models. This setup includes 1,024 Grace Hopper Superchips.
Very neat!
?
Strategic Accounts - Acquisition Team
1 年Tony, without a doubt, the next generation of data centers will require a much deeper thought and engineering process to absorb the massive increase in power density and rack liquid cooling requirements. Data Center of the Future - who has the power?
Liquid Cooling Consulting Services
1 年The NVIDIA GPUs represent the first commitment at scale by the industry to bring liquid cooled servers to market. Liquid cooling is inherently more efficient than air so win-win for the power grid.
Seeking Planet Friendly Solutions
1 年#OpenCompute! Yes Tony Grayson
Hyperscale Data Center Infrastructure Specialist, Strategist, Energy Efficiency & Sustainability Leader, with 40+ years in tech Researcher/Inventor/Fellow/Advisor
1 年Thanks for this, Tony. It's amazing to see how far Nvidia has come. I worked with them 20 years ago deploying HPC racks that used Infiniband and MPI to create huge virtual blocks of memory required to optimize EDA app run times. It was a brilliant move, one of many, to acquire Mellanox. But to tie together GPU's into dynamically reconfigurable GPU clusters is a game changer. Well done, Nvidia!
HVAC for Life | Writer | Mentor | Skilled Trades Zealot | Dot Connector
1 年Tony, thanks for the #GraceHopper info yesterday!