Beyond NVIDIA: Is AMD the only GPU alternative for HPC/AI Workloads

Beyond NVIDIA: Is AMD the only GPU alternative for HPC/AI Workloads

In my last article, I discussed what a GPU was and primarily covered NVIDIAs history and product suite.?While NVIDIA GPUs have long been the go-to choice for these AI workloads, there have been a host of alternative options, both GPU and other, simmering away, mostly behind the scenes.

I wanted to shine a light on these, the current state of the market, and where the future might be headed.


So let's get stuck in…


GPU based: AMD and Intel

AMD GPUs

AMD, a formidable competitor in the CPU market, and arguably has the lion share of the CPU sales in the HPC market due to core count density and power efficiencies, has also made significant strides in the GPU space with its "Radeon Instinct" series. These GPUs offer an appealing alternative to NVIDIA's dominance, particularly in certain niches. Key points to consider include:

  • Architecture: AMD's RDNA architecture focuses on energy efficiency and scalability, making it suitable for both gaming and professional applications.
  • Heterogeneous Compute: AMD GPUs support a variety of programming models, including OpenCL and ROCm, enabling developers to harness their power for parallel processing tasks. AMD also joined PyTorch Foundation to further development of PyTorch, a computational framework based on Python.
  • Datacenter Integration: With initiatives like the AMD CDNA architecture, AMD aims to carve out a space for itself in data centers, providing competition to NVIDIA in HPC and AI workloads.

Launching initially in 2017 with a 150W card "MI6", AMD recently announced their latest 750W, 192GB beast, "Instinct MI300X" in production later this year.?

No alt text provided for this image

Disclaimer: comparing apples to apples is very difficult, workload dependent, and folks should always be skeptical of vendor-provided performance metrics.

Nvidia accused of cheating in big-data performance test by benchmark's umpires: Workloads 'tweaked' to beat rivals in TPCx-BB ? The Register


For comparison, NVIDIA's H100 has 80GB memory, requiring NVLink to pool GPUs to address higher levels of memory. It will be very interesting to see developers using this new platform, as AMD has a significant memory bandwidth and capacity advantage, with TFLOPs/TOPS yet to be announced. As software is rapidly improving, you will likely see more AMD GPU's in the wild later this year.


Intel GPUs

Intel, renowned for its CPUs, has also entered the GPU arena with its Intel Xe architecture. These GPUs bring a new dynamic to the market, offering several unique aspects:

  • Integration: Intel GPUs are designed to work in synergy with Intel CPUs, potentially optimizing system-level performance in HPC and AI setups.
  • OneAPI: Intel's OneAPI initiative strives to provide a unified programming model across its various hardware components, including GPUs, CPUs, and FPGAs, simplifying the development process.
  • Xe HPC: Intel's Xe HPC GPUs target the high-performance computing market, competing directly with NVIDIA's Tesla series.

It's important to note that Intel is a relative newcomer to the high end GPU market, having almost exclusively offered 'good-enough' integrated graphics in laptops and desktop parts for decades.?Their first-gen architecture, codenamed "Alchemist" has a SKU referred to as "Xe-HPC" was only seen in public two months back (Intel's Ponte Vecchio is Finally in The Wild | Tom's Hardware (tomshardware.com)) due to years of delays.

No alt text provided for this image


Intel's second-gen architecture, codenamed "Battlemage" has a SKU referred to as "Xe2-HPC" and may be released as "Realto Bridge" but according to recent reports, only likely based on "Enhanced Xe-HPC cores, not Xe2-HPC cores.?In parallel, expectations for their consumer GPUs based on Battlemage have significantly tampered down (Intel rumoured to be scaling back its next-gen Battlemage GPU | PC Gamer) for NVIDIA's last-gen mid-range GPU (released early 2022), for a product not due for release until mid-2024.

Intel has a lot of work in front of them, and it appears that it will be many years to see IF they are able to bridge the gap at the high end of the market, or they remain in the low-mid performance (and cost) range.?


Fringe Alternatives: ASICs (TPUs) and FPGAs

Application-Specific Integrated Circuits (ASICs)

ASICs are custom-designed chips tailored to perform a specific task exceptionally efficiently. In the context of HPC and AI, ASICs can be optimized for specific workloads, yielding substantial performance benefits:

  • Efficiency: ASICs excel in power efficiency and performance for their designated tasks, making them suitable for data-centric applications.
  • Challenges: Developing ASICs requires significant time, effort, and resources. They are not easily reprogrammable, limiting their flexibility for rapidly evolving workloads.

?

Amazon's AI platforms (Trainium and Inferentia) are powered by Nitro chips.?Nitro is Amazon's custom ASIC (powered by their Annapurna acquisition) that offloads all kinds of tasks, including training and inferencing workloads. It's reported that every AWS server that ships comes with at least one Nitro chip.?

No alt text provided for this image
Amazon EC2 server with an Annupurna ASIC (just above the purple handle)


Another alternative ASIC is what Google calls their TPU (Tensor Processing Units).?Designed to address the unique demands of machine learning tasks, TPUs offer a specialized solution that are differentiated from traditional GPUs and other alternatives. Google's TPUs are trailblazers in AI acceleration, emphasizing performance, energy efficiency, and cloud-based accessibility.

No alt text provided for this image
A Google TPU on a PCIe card


Field-Programmable Gate Arrays (FPGAs)

FPGAs are reconfigurable hardware components that can be programmed to perform various tasks, offering a balance between flexibility and performance:

  • Customizability: FPGAs can be reprogrammed for different workloads, making them adaptable to changing requirements.
  • Parallelism: FPGAs excel at parallel processing, which is highly beneficial for certain AI and HPC tasks.
  • Learning Curve: Working with FPGAs often requires specialized expertise in hardware design and programming, potentially lengthening the development cycle.

FPGA's tend to be used in local, embedded solutions utilizing OpenCL, and don't tend to be used HPC-centric workloads.?Think self-driving cars, medical imaging, and machine vision.?Intel actually make a FPGA, called the Stratix 10 GX (released 2018) which achieves 143 INT8 TOPS at up to 225W, or around 1/2 of a last-gen AMD Instinct MI250X.

No alt text provided for this image
A Intel Stratix 10 GX FPGA Development Kit


In summary…

The HPC and AI landscape is evolving, and whilst the obvious choice for hardware accelerators has overwhelmingly NVIDIA GPUs, AMD specifically, is gaining traction with their GPUs, offering a competitive alternative. Intel is very early in their entry and more fringe options like ASICs and FPGAs bring unique advantages but also challenges related to customization, programming complexity, and development time.

As the demand for computational power continues to grow, understanding and exploring these alternatives will be crucial for making informed decisions in optimizing HPC and AI workloads.

??Brian Keltner??

Strategic Fractional CMO | Reputation Management Specialist | Driving Business Growth Through Marketing Leadership & Brand Strategy | Expert in Customer Acquisition & Digital Presence Optimization | Gunslinger

1 年

Nick, thanks for sharing!

Robert Linsdell

General Manager, Australia, New Zealand and APAC, at Ekkosense.

1 年

Another insightful share Nick, thank you

Paul Edmondson

EMEA VP & GM at GRC: The Immersion Cooling Authority

1 年

With the abundance of AI workloads driving IT architecture to the max be that Nvidia, Intel or AMD the only real question is, how do you cool the Infrastucture? Our answer is simple, if you are in doubt see how our customer is overcoming these very challenges https://www.grcooling.com/learning-center/grc-dell-tacc-immersion-cooling-case-study/

Andy Ramgobin

CEO & Co Founder at CodeZero / CEO & Co Founder at NYX VX - Epic MegaGrant Recipient / CEO & Founder at Momentum Enterprise Solutions (24k+)

1 年

You know my feelings on this Nick :) I’m doing a lot of work with both AMD and Intel and I plan to get my hands on a Habana Gaudi platform to specifically run Deep Learning and HPC workloads. AMD have ROCm for porting CUDA applications and both Intel and AMD GPUs work with Tensorflow and PyTorch which are two very common software libraries. NVIDIA’s software stack is fantastic, especially AI for Enterprise (VMware have EOL’d Bitfusion) which integrates directly into VMware Tanzu, but NVIDIA is not the only answer for AI, ML and HPC workloads. Both Intel and AMD have embraced immersion cooling which makes them ideal vendors to work with :)

要查看或添加评论,请登录

Nick Hume的更多文章

  • Behind the Curtain: AWS re:Invent 2024 Highlights

    Behind the Curtain: AWS re:Invent 2024 Highlights

    Expanding on my post from last week, it was great to see AWS leaning back into their engineering roots at re:Invent…

    3 条评论
  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    For the final piece of the Global Summit wrap up, I focus on Networking, both inside the server and between racks, and…

  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    We've touched on the power innovations at the summit, so obviously, the next logical step is to talk about cooling…

    2 条评论
  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    Originally planned as a two-part reflection, my series from the fantastic OCP Summit has grown into a series! Up next:…

    2 条评论
  • OCP Global Summit 2024 Series

    OCP Global Summit 2024 Series

    It’s been a busy conference season, with the AI Hardware and Edge AI Summit, Yotta 2024, and OCP’s Global Summit all…

    3 条评论
  • AI for real life

    AI for real life

    As I’ve been busy with my day job(s) and various projects, like the Tech Insider Podcast, I haven’t put my hands to the…

    1 条评论
  • To InfiniBand, maybe beyond?

    To InfiniBand, maybe beyond?

    Nvidia's latest roadmap was teased at Computex in Taiwan last month. Whilst details were a little light on PFLOPS and…

  • Apple, not Artificial, Intelligence

    Apple, not Artificial, Intelligence

    Just last month, Apple hosted their yearly WWDC - an event where they showcase all the updates to their platforms…

  • Oh great, another podcast...

    Oh great, another podcast...

    As you may have seen (or heard my "Ausmerican" accent) recently, I've started a podcast, and wanted to share a little…

    2 条评论
  • OCP 2024 Regional Summit wrap

    OCP 2024 Regional Summit wrap

    The Open Compute Project (OCP) Regional Summit was hosted in Lisbon, Portugal last month, the 5th (and largest)…

社区洞察

其他会员也浏览了