Beyond NVIDIA: Is AMD the only GPU alternative for HPC/AI Workloads
In my last article, I discussed what a GPU was and primarily covered NVIDIAs history and product suite.?While NVIDIA GPUs have long been the go-to choice for these AI workloads, there have been a host of alternative options, both GPU and other, simmering away, mostly behind the scenes.
I wanted to shine a light on these, the current state of the market, and where the future might be headed.
So let's get stuck in…
GPU based: AMD and Intel
AMD GPUs
AMD, a formidable competitor in the CPU market, and arguably has the lion share of the CPU sales in the HPC market due to core count density and power efficiencies, has also made significant strides in the GPU space with its "Radeon Instinct" series. These GPUs offer an appealing alternative to NVIDIA's dominance, particularly in certain niches. Key points to consider include:
Launching initially in 2017 with a 150W card "MI6", AMD recently announced their latest 750W, 192GB beast, "Instinct MI300X" in production later this year.?
Disclaimer: comparing apples to apples is very difficult, workload dependent, and folks should always be skeptical of vendor-provided performance metrics.
For comparison, NVIDIA's H100 has 80GB memory, requiring NVLink to pool GPUs to address higher levels of memory. It will be very interesting to see developers using this new platform, as AMD has a significant memory bandwidth and capacity advantage, with TFLOPs/TOPS yet to be announced. As software is rapidly improving, you will likely see more AMD GPU's in the wild later this year.
Intel GPUs
Intel, renowned for its CPUs, has also entered the GPU arena with its Intel Xe architecture. These GPUs bring a new dynamic to the market, offering several unique aspects:
It's important to note that Intel is a relative newcomer to the high end GPU market, having almost exclusively offered 'good-enough' integrated graphics in laptops and desktop parts for decades.?Their first-gen architecture, codenamed "Alchemist" has a SKU referred to as "Xe-HPC" was only seen in public two months back (Intel's Ponte Vecchio is Finally in The Wild | Tom's Hardware (tomshardware.com)) due to years of delays.
Intel's second-gen architecture, codenamed "Battlemage" has a SKU referred to as "Xe2-HPC" and may be released as "Realto Bridge" but according to recent reports, only likely based on "Enhanced Xe-HPC cores, not Xe2-HPC cores.?In parallel, expectations for their consumer GPUs based on Battlemage have significantly tampered down (Intel rumoured to be scaling back its next-gen Battlemage GPU | PC Gamer) for NVIDIA's last-gen mid-range GPU (released early 2022), for a product not due for release until mid-2024.
领英推荐
Intel has a lot of work in front of them, and it appears that it will be many years to see IF they are able to bridge the gap at the high end of the market, or they remain in the low-mid performance (and cost) range.?
Fringe Alternatives: ASICs (TPUs) and FPGAs
Application-Specific Integrated Circuits (ASICs)
ASICs are custom-designed chips tailored to perform a specific task exceptionally efficiently. In the context of HPC and AI, ASICs can be optimized for specific workloads, yielding substantial performance benefits:
?
Amazon's AI platforms (Trainium and Inferentia) are powered by Nitro chips.?Nitro is Amazon's custom ASIC (powered by their Annapurna acquisition) that offloads all kinds of tasks, including training and inferencing workloads. It's reported that every AWS server that ships comes with at least one Nitro chip.?
Another alternative ASIC is what Google calls their TPU (Tensor Processing Units).?Designed to address the unique demands of machine learning tasks, TPUs offer a specialized solution that are differentiated from traditional GPUs and other alternatives. Google's TPUs are trailblazers in AI acceleration, emphasizing performance, energy efficiency, and cloud-based accessibility.
Field-Programmable Gate Arrays (FPGAs)
FPGAs are reconfigurable hardware components that can be programmed to perform various tasks, offering a balance between flexibility and performance:
FPGA's tend to be used in local, embedded solutions utilizing OpenCL, and don't tend to be used HPC-centric workloads.?Think self-driving cars, medical imaging, and machine vision.?Intel actually make a FPGA, called the Stratix 10 GX (released 2018) which achieves 143 INT8 TOPS at up to 225W, or around 1/2 of a last-gen AMD Instinct MI250X.
In summary…
The HPC and AI landscape is evolving, and whilst the obvious choice for hardware accelerators has overwhelmingly NVIDIA GPUs, AMD specifically, is gaining traction with their GPUs, offering a competitive alternative. Intel is very early in their entry and more fringe options like ASICs and FPGAs bring unique advantages but also challenges related to customization, programming complexity, and development time.
As the demand for computational power continues to grow, understanding and exploring these alternatives will be crucial for making informed decisions in optimizing HPC and AI workloads.
Strategic Fractional CMO | Reputation Management Specialist | Driving Business Growth Through Marketing Leadership & Brand Strategy | Expert in Customer Acquisition & Digital Presence Optimization | Gunslinger
1 年Nick, thanks for sharing!
General Manager, Australia, New Zealand and APAC, at Ekkosense.
1 年Another insightful share Nick, thank you
EMEA VP & GM at GRC: The Immersion Cooling Authority
1 年With the abundance of AI workloads driving IT architecture to the max be that Nvidia, Intel or AMD the only real question is, how do you cool the Infrastucture? Our answer is simple, if you are in doubt see how our customer is overcoming these very challenges https://www.grcooling.com/learning-center/grc-dell-tacc-immersion-cooling-case-study/
CEO & Co Founder at CodeZero / CEO & Co Founder at NYX VX - Epic MegaGrant Recipient / CEO & Founder at Momentum Enterprise Solutions (24k+)
1 年You know my feelings on this Nick :) I’m doing a lot of work with both AMD and Intel and I plan to get my hands on a Habana Gaudi platform to specifically run Deep Learning and HPC workloads. AMD have ROCm for porting CUDA applications and both Intel and AMD GPUs work with Tensorflow and PyTorch which are two very common software libraries. NVIDIA’s software stack is fantastic, especially AI for Enterprise (VMware have EOL’d Bitfusion) which integrates directly into VMware Tanzu, but NVIDIA is not the only answer for AI, ML and HPC workloads. Both Intel and AMD have embraced immersion cooling which makes them ideal vendors to work with :)