Grace Under Pressure: Nvidia's Debut CPU Elevates HPC to New Heights
Tony Grayson
Defense, Business, and Technology Executive | VADM Stockdale Leadership Award Recipient | Ex-Submarine Captain | LinkedIn Top Voice | Author | Top 10 Datacenter Influencer | Veteran Advocate |
Nvidia's "Grace" CG100 processor launch marks a significant milestone in the company's expansion into dedicated server CPUs, ideally suited for HPC simulation and modeling tasks. This processor enhances the memory capabilities of Nvidia's "Hopper" GH100 GPU accelerators and excels in high-performance computing (HPC) scenarios. Boasting a substantial core count and minimal thermal output, coupled with low-power DDR5 (LPDDR5) memory optimized for server applications with error correction, the Grace CPU is an attractive option for HPC systems that typically require 256 GB or 512 GB of memory per node.
The fusion of two Grace CPUs into a single Grace-Grace super chip, utilizing NVLink interconnects for memory coherence across LPDDR5 memory banks and consuming merely 500 watts, has intrigued the HPC community. This advanced setup offers 144 Arm Neoverse "Demeter" V2 cores and up to 1 TB of physical memory, achieving a peak theoretical bandwidth of 1.1 TB/sec. Nonetheless, due to LPDDR5 memory production yields, only 960 GB of memory and 1 TB/sec of bandwidth are effectively usable. Theoretically, Nvidia could enhance this design to a four-way Grace module, significantly boosting performance with 288 cores and 1.9 TB of memory.
Benchmarking efforts by the Barcelona Supercomputing Center and the State University of New York at Stony Brook and Buffalo have showcased the Grace CPU's capabilities in HPC and AI fields. The Grace-Grace and Grace-Hopper Superchips have demonstrated superior performance relative to conventional x86 CPU nodes, showcasing the Grace CPU's advantage in terms of thermal efficiency and potential cost in HPC environments.
领英推荐
Competing directly with older x86 CPUs, the Grace-Grace configurations have achieved notable speed increases in various applications, from computational mechanics and fluid dynamics to climate modeling, molecular dynamics, and multicellular simulations. For instance, performance in OpenFOAM has improved up to 4.49 times and up to 3.24 times in PhysiCell simulations with the Grace-Grace setup, significantly surpassing previous models.
Moreover, performance analysis across various benchmarks, such as the HPC Challenge (HPCC) and High-Performance Conjugate Gradients (HPCG) tests, has further confirmed the superior stance of the Grace super chip. The Grace-Grace super chip has often met or exceeded the performance of leading Intel and AMD CPUs, mainly when used with Hopper GPUs, as evidenced by the Gromacs molecular dynamics benchmark.
This evidence highlights the Grace CPU's adaptability across various HPC workloads. It provides a glimpse into its capacity to transform computational practices in scientific research, supported by data and comparative analyses that spotlight its benefits over current options.