NVIDIA
NVIDIA’s growth is an index on the growth of AI. “Compute revenue grew more than 5x and networking revenue more than 3x from last year.”
Data center revenue totaled $26b, with about 45% from the major clouds ($13b). These clouds announced they were spending $40b in capex to build out data centers, implying NVIDIA is capturing very roughly 33% of the total capex budgets for their cloud customers.
“Large cloud providers continue to drive strong growth as they deploy and ramp NVIDIA AI infrastructure at scale and represented the mid-40s as a percentage of our Data Center revenue.”
NVIDIA has started to highlight the return-on-investment (ROI) for cloud providers. As the prices for GPUs increases, so do NVIDIA’s profits, to a staggering degree - nearly 10x in dollar terms in 2 years. Is this a problem for the clouds?
That may not matter to GPU buyers - at least not yet - because of the unit economics. Today, $1 spent on GPUs produces $5 of revenue.
“For every $1 spent on NVIDIA AI infrastructure, cloud providers have an opportunity to earn $5 in GPU instant hosting revenue over 4 years.”
But soon it, it will generate $7 of revenue. Amazon Web Services operates at a 38% operating margin. If these numbers hold, newer chips should improve cloud GPU profits - assuming the efficiency gains are not competed away.
“H200 nearly doubles the inference performance of H100, delivering significant value for production deployments. For example, using Llama 3 with 700 billion parameters, a single NVIDIA HGX H200 server can deliver 24,000 tokens per second, supporting more than 2,400 users at the same time. That means for every $1 spent on NVIDIA HGX H200 servers at current prices per token, an API provider serving Llama 3 tokens can generate $7 in revenue over 4 years.”
And this trend should continue with the next generation architecture, Blackwell.
领英推荐
“The Blackwell GPU architecture delivers up to 4x faster training and 30x faster inference than the H100”
We can also guesstimate the value of some of these customers. DGX H100s cost about $400-450k as of this writing. With 8 GPUs for each DGX, that means Tesla acquired about $1.75b worth of NVIDIA hardware assuming they bought, not rented, the machines.
“We supported Tesla’s expansion of their training AI cluster to 35,000 H100 GPUs”
In a parallel hypothetical, Meta would have spent $1.2b to train Llama 3. But the company plans to have buy 350,000 H100s by the end of 2024 implying about $20b of hardware purchases.
“Meta’s announcement of Llama 3, their latest large language model, which was trained on a cluster of 24,000 H100 GPUs.”
As these costs skyrocket, it wouldn’t be surprising for governments to subsidize these systems just as they have subsidized other kinds of advanced technology, like fusion or quantum computing. Or spend on them as a part of national defense.
“Nations are building up domestic computing capacity through various models.”
There are two workloads in AI : training the models & running queries against them (inference). Today training is 60% and inference is 40%. One intuition is that inference should become the vast majority of the market over time as model performance asymptotes.
However it’s unclear if that will happen primarily because of the massive increase of training costs. Anthropic has said models could cost $100b to train in 2 years.
“In our trailing 4 quarters, we estimate that inference drove about 40% of our Data Center revenue.”
The trend shows no sign of abating. Neither do the profits!
“Demand for H200 and Blackwell is well ahead of supply, and we expect demand may exceed supply well into next year.”
Consultant- Business Customer Service
9 个月Insightful!
Growth Executive | Software, AI & Infra | Revenue, Partnerships and M&A | Advisor
9 个月Solid value chain analysis to confirm the trends. The big question is the mix trend from 60/40 to ??? Eventually Training (which is a development cost) must trend downward as Inference (where the value creation lives for the end of the value chain) increases to harvest business returns. Maybe not in short run as Training matures, but in medium run it seems likely. Unless Training really competes with Labor and other work, and not Inference itself. Maybe askingabout mix isn’t asking the right question?
Pricing | Monetization | Growth strategy
9 个月Abhiram Parthasarathi
I fix broken Customer Success and Implementation teams | Retained over $1.8B of ARR | 2023 Pavilion 50 CCOs to watch | Top 25 CS Strategist | Data-driven Results
9 个月Jay Nathan
Executive Recruiter @ NVIDIA
9 个月Hey Tomasz, I read your analysis and I've gotta say, you really hit the nail on the head! It's amazing to see how AI and high-performance computing are totally transforming so many industries. Working at NVIDIA, I get a front-row seat to all the incredible progress happening. You wouldn't believe the kind of breakthroughs we're seeing in healthcare, automotive, and scientific research. It's like the perfect storm of AI, accelerated computing, and deep learning coming together to solve problems we never thought possible. The stuff you mentioned about the bigger picture impact is so spot on. These technologies aren't just cool for the sake of being cool - they're legit changing lives and driving serious economic growth. It's really exciting to be a part of it all. Thanks for breaking it down so clearly, man. You've got a real knack for explaining this stuff in a way that gets people pumped about the future. Keep up the great work! Cheers, Jason