Deepseek: Is the Data Center Industry Deep Sunk?

Deepseek: Is the Data Center Industry Deep Sunk?

The last several days have seen financial markets and pundits losing their collective minds over several recent AI developments, especially the Deepseek training announcements and to a lesser degree, Alibaba’s Qwen AI model. TL;DR people need to calm down - this is a long haul, not a get rich quick scheme - at least for most of us. If you are, however, engaged in a short term liquidity hunt and this situation has terminally shaken your confidence due to doom scrolling on CNN, perhaps it's time to find your next grift…uh, I mean career. The rest of us will somehow recover from your departure, I assure you. Now, to the week’s developments….

NVIDIA was riding a bubble and everyone knew it. NVIDIA’s valuation was buoyed by largely unsustainable proclamations around their ability to ship new generations of chips and machine racks. Jensen is pushing too hard and the tech press has been too soft in their analysis. This was an inevitable adjustment and one that should have occurred long before now. That being said…

Are the Deepseek developments fake or real - well, likely a little of both. The training gains are, in reality, likely to produce about 30% increased efficiency. That doesn’t mean we need 30% fewer chips or data centers, it means we get 30% more power. Goldman Sachs has lamented that AI isn’t delivering enough ROI - well, this increases the gain significantly. There is a logical fallacy in IT, that a 10% efficiency gain means 10% less data center and 10% less server machines. And yet, that has been repeatedly disproven.?

In economics, the Jevons paradox occurs when technological advancements make a resource more efficient to use (thereby reducing the amount needed for a single application); however, as the cost of using the resource drops, overall demand increases causing total resource consumption to rise. Governments, financial institutions, and journalists have typically expected efficiency gains to lower resource consumption, rather than anticipating possible increases due to the Jevons paradox - they’ve been proven wrong, repeatedly. Dario Amodei and Satya Nadella have both noted the applicability of Jevons to AI.?

On the “fake” side, there is strong emerging evidence that Deepseek used OpenAI’s output, either intermediate or final, to train its model. I don’t see anything significantly unethical about that, but it calls into question the fundamental basis of Deepseek’s claims of massive training cost savings. They are doing a very good job of standing on the shoulders of giants, which is how science works. And this is nothing new - its science. There are also some indications that Deepseek models may be less efficient for inference which will matter much more in the long run, as data center and chip demand start to swing from training to inference applications over the next 48 months.?

Deepseek, like Meta ML models, are open source. We should assume that OpenAI and other model builders have already begun incorporating their advancements. To that point, we should expect a continuous series of step function efficiency improvements, initially to training, then eventually to inference. Expect annual efficiency gains of approximately 30%. This is important because we don’t have enough power, capital,? data centers, or chips to meet the demand as currently forecasted in the most optimistic cases (80 GW ++). In addition, we can not support 300kw racks with today’s technology - or tomorrow’s. The Laws of Thermodynamics are rather stubborn. If these efficiency improvements slow the curve of increasing power densities, we’ll be very lucky indeed - the alternative is a bunch of rapidly obsolete data center capacity and rapidly increasing per MW build costs.?

Do not confuse hysteria and groupthink with any actual change in the reality of AI. Technologies tend to innovate in a series of step-functions - there are dozens of examples. Video distribution Codecs are a great example, with a decade of 25% efficiency CAGRs has helped to drive profitability in platforms like Youtube and Instagram.?

No one has replicated Deepseek’s full results. While they look impressive and are certainly making some advances, it is unlikely that their results will be replicated in full - and the results have been known to AI researchers and software engineers for 30 days, not 3 days, as the financial media would have you believe.? This is science, not belief - if you can’t replicate it, its not real - or not entirely so. While many of my financial customers would love to weave house-of-cards disaster scenarios without data, this is not an approach that I will support.?

Also, on the “did they fake it” front, Deepseek appears to have 50,000 H100s, smuggled through Malaysia. It is difficult to believe they are sitting on modern GPU boxes while aging A100s were used to train Deepseek’s model. This indicates that Deepseek’s representations are exaggerated, possibly in an effort to destabilize public markets by a private company beyond the reach of western regulators.?

Mainstream media outlets like CNN are engaging in a “how the mighty have fallen” narrative - ironically, spearheaded by completely non-technical writers. Needless to say, there has historically been significant antipathy between legacy journalists and the technical community due to a perception that techies have an unfair profitability gradient. Some skepticism has been appropriate all along, but the irrational exuberance, fast followed by “the emperor has no clothes” is simply a sign of poor analysis. “Oh how the mighty have fallen” is a popular writing prompt when talking about unpopular and difficult to identify with tech-bros like Zuckerburg or Altman. And the media is leaning into it hard, for ad impressions.?

On the data center side, the move from primarily building Training capacity to constructing? Inference sites has been something we’ve talked about for some time. The efficiency gains seen here do not significantly accelerate this trend. We believe the industry will move from 80/20 training:inference new-construction in 2025 to 80/20 inference training new-construction in 2029. The hyperfocus on training today is natural, but short-sighted. The future is inference everywhere, which means larger data center construction within 150 miles of Internet Exchange Points (IXPs) with an extremely high degree of fiber interconnectivity.

The other big advance that on one is noticing is that distributed training is starting to take off - there are now the first deployments of training workloads distributed across a 30km radius with sufficient bandwidth - and sufficient is very large indeed - O(100Tb), driving massive fiber investments at 288f count scale and larger.?

One particularly sensitive point that people are afraid to state publicly is that many Chinese technology companies regularly overstate or exaggerate their technical accomplishments. This is largely for fear of accusations of racism, which is ironic considering the sheer number of east and south Asian engineers toiling away at competing American AI companies like Meta, OpenAI, Anthropic, Microsoft, Google and others. This is purely an issue of regulation, not culture - the American companies and their engineers - of any ethnicity and origin -? would be punished in various ways - securities regulations and otherwise - for gross exaggerations, whereas Chinese companies are not well regulated in this regard. In areas where American companies are poorly regulated - real estate development is a great example - we see the same sort of wild claims - 10GW Data Center Campuses, anyone? Prior behavior is always the best indicator of future performance, even collectively.?

In conclusion - somewhat but not entirely exaggerated claims; normal step functions of efficiency; countdown for the shift to inference; sometimes loosely regulated companies play fast and loose with their results. This is the new normal - same as the old normal. Real estate is real estate and technology is technology.?

Rex Stock

Seeking Planet Friendly Solutions

1 个月

Some blame scurvy... Love the warships! Daniel Golding. Another fun/succinct series of actionable data for all to learn from... Thank you!

回复
KC Mares

Data center energy and Onsite Power Solutions Leader

1 个月

thanks Daniel Golding for pointing out what we in the data center industry all know and understand--that efficiency is a continuous improvement and it increases utilization and output, which then increases adoption and market growth. Same fears I have dealt with since the dawn of the internet and data centers, and yet, each continues to expand in all ways. Efficiency will continue to improve and so will the products and market adoption and overall growth of the products we support.

回复
Andrew S. Albrecht

Co-Founder at AUBix, LLC

1 个月

Insightful as always Dan!!

回复
Erik Stockglausner

Global Strategic Consultant & Inspirational Leader | Data Centers & Critical Infrastructure | Veteran Advocate

1 个月

Good read Daniel Golding! Thanks for sharing. Question: How do you see the need for greater context memory for inference influencing future data center designs (power density) and GPU demand?

要查看或添加评论,请登录

Daniel Golding的更多文章

社区洞察

其他会员也浏览了