The Great XPU Debate: Are GPUs In Trouble?

The Great XPU Debate: Are GPUs In Trouble?

The great XPU versus GPU debate is raging on… And last week after Marvell and Broadcom posted earnings, both triple beats despite very different market reactions, the XPU versus GPU debate raged on and frankly, I saw quite a few really bad takes about the future of the AI accelerator, and what the expected impact of the XPU will be on Nvidia and its data center GPU business.

First of all, I think it’s really important to know that this is not a zero-sum game. The future will not be 100% GPU, and it certainly won’t be 100% XPU.

Second, the claim that Jensen made about XPU or AI accelerators being multiple years behind is partially true and of course it is the perception that NVIDIA would want to create for its market.

In actuality, there will very likely be a significant growth of the overall AI chip market that will reach the trillions by the end of the decade. We see a CAGR for AI Chip growth in the 30 to 40% range based on our research and we see XPU use growing slightly more quickly than GPUs, but the cost per unit and the market penetration of GPUs is considerably higher and will remain so for the foreseeable future.

The way we see it, there is the internal use cases within the hyperscaler and the external use cases and this is where a potential penetration of market could happen.

External use cases will likely remain nearly entirely Nvidia, and this is due to its software mode and early market penetration. Developers that are doing training and AI software and workloads in the cloud are deeply knowledgeable on the Nvidia platform and its software is highly flexible and dynamic, and it enables Rapid development for the companies building in the cloud.

Internal use cases, as in the ones that large hyperscalers, including Meta, Bytedance, google, Amazon, Microsoft, and others are building to scale for their own production are likely where XPU will see significant penetration based on the following:

1. Cost: the expected cost of XPU will be less than 50% and likely 70% less than GPUs. Based upon the volume and scale of production workloads, this will provide meaningful margin for these companies that are deploying high volume of repetitive workloads. This will be prominently inference workloads but will include training workloads too like we’ve seen with Gemini training with TPU.

2. Software: the average enterprise or AI software developer isn’t going to be able to customize software. While we have seen higher abstractions enable developers to build outside of Cuda, this is still pretty nascent in the AI software development space. Large cloud and hyperscalers have the resources to customize software for their use cases, and this will tie into the economics above.

3. Destiny: probably the most significant reason we will see the XPU growth will be these large companies wanting to control more of their outcome and long-term destiny and while I expect huge volumes of Nvidia to be used by all of these companies, I also expect them to be cautious about access capacity differentiation, and other rate limiters that come when dependent on merchant silicon.

4. Advancement: Jensen was largely correct that past versions of accelerators have been meaningfully behind NVIDIAs GPUs, but based upon what we’ve seen, we do think that gap will close and we do think the considerations above will play a factor in the biggest buyers of GPUs diversifying, but the overall market will grow as well. Custom HBM and co-packaged optics will play a role in making these XPUs more powerful and as used cases evolve for these large companies that have high production volumes, we see some of the spend pivoting probably landing around 70/30 GPU to XPU over the next 2 to 3 years.

In the end, we are currently looking at a run rate nearly $100 billion data center business right now for GPU’s and somewhere around 10 to 15 billion for XPU. Continued below

The market is expected to reach as high as 1 trillion in these chips over the next 3 to 5 years. So even if the distribution ended up being 50-50 that is still a potential $500 billion a year GPU business for Nvidia’s data center revenues. I continue to find a lot of the takes that XPUs will never make it or that NVIDIA will be taken out by this accelerator business as silly baseless and a oversimplified and generally incompetent viewpoint. Irresponsible really.


Stevie Ray Allen

President Americas

1 周

It's not really a debate. AI is evolving and is highly unpredictable. It's like debating about surprises. If you can easily surmise, it wasn't a surprise. (sorry, sounded better in my head). Get over it.

回复
Michael Nauen

Senior Infrastructure Engineer at Commerzbank

1 周

Maybe a big come back?

  • 该图片无替代文字
回复
Michael Nauen

Senior Infrastructure Engineer at Commerzbank

1 周

From my perspective we see a major change after 2 Nanometer is reached in the silizium area. You can produce optical ai cpu on 90 nm semiconductor machines with a little upgrade which cost 15 millionen versus 400 million for ASML. And optical is 30-100 times better in energy efficency. https://www.tagesschau.de/wissen/technologie/photonische-chips-100.html

回复
Joe Dickson

SVP Chip to Chip Reliability and Innovation at WUS PCB Intl

1 周

Hello Daniel, the price point of XPU being 1/2 seems out of alignment with the applications of GPU's in the Network Ecosystem. Since the highest power and SI requirements are currently in the GPU, if you have both, the GPU will drive all routing and power (costs) technology for the packaging and PCB below the die. Thus this cost impact needs to be understood so the whole system cost is known. I expect this will be lower than 1/2 costs unless the XPU is the only chip used.

回复
Rakesh Cheerla

Technologist & Product Manager

1 周

Daniel Newman - what happens if Nvidia launches GPUs at much lower prices, which it will do. How do we see the GPU/XPU market evolve if we have roughly price parity across GPU/XPU?

要查看或添加评论,请登录

Daniel Newman的更多文章