It’s Not a Cooling Problem, It’s a Power Problem

It’s Not a Cooling Problem, It’s a Power Problem

Introduction:

In the fast-evolving landscape of High-Performance Computing (HPC), the symbiotic relationship between AI and computational power is pushing the boundaries of innovation. Traditionally, the spotlight has been on cooling solutions to ensure optimal performance. However, with the rise of AI-driven HPC and the increasing power demands, the narrative is shifting. The real challenge lies not in cooling hardware as using the likes of ColdLogik RDHx supports high kW cooling, but it is in meeting the voracious appetite for power. In this paradigm shift, it's not a cooling problem, it's a power problem.

The Current Scenario:

Let’s take nVidia, a key player in the HPC hardware arena, who currently continues to produce air-cooled systems like the DGX H100 and A100 offerings. While these systems have proven efficient, the relentless pursuit of more powerful AI models demands a shift in focus. The challenge is not merely cooling these systems but supplying the colossal power they require for optimal performance. Data Center kW capacity as grown exponentially. Don’t get me wrong, advance to towards DCLC is still happening but it’s still a power play that will become now the limiting factor.

The Power Predicament:

AI-driven HPC deployments now demand 10-15 times more power in the rack than traditional data center facilities were designed for. Facilities built with a focus on 5-8kW per rack can be rendered insufficient in the face of this power surge. The power problem is multifaceted, encompassing both the incoming power to facilities and the rack footprints., from 5-8kW (8kW being the global average), to a need of 50-80kW per rack footprint.

The challenges here are the design of facilities, what was once a high capacity Data Center in the USA at 150MW, is now deemed too small. 5-8kW per rack footprint, just won't cut it. Design of facilities needs to be 10 times that for power.

As an example, what originally was a 9MW hall, 1500 rack footprint (6kW per rack), now needs to be in excess of a 75MW hall, same amount of racks, but a standard 50kW per rack….. I’m sure you see the issue here.

Power Supply Mechanisms:

One of the critical issues in HPC for AI is the capability of power supply mechanisms. The traditional in-rack Power Distribution Units (PDUs) need a significant upgrade to match the power requirements. This transition necessitates a move towards advanced PDUs that can efficiently manage the increased power loads and contribute to the overall efficiency of the system. What we see currently is a requirement to install 6x 63amp 3 phase units, to achieve such capacities which has a significant knock on effect in the rack footprint.

Rack Footprint Evolution:

The traditional 1200mm deep racks are becoming obsolete in the face of the power challenge. To accommodate the increased power demands and the necessary infrastructure that we have today, a shift to 1400mm deep racks is likely to become the norm. Rack widths are also evolving, with 800mm and 1000mm options gaining prominence, all standardised at 52 Rack Units (RU). This maximises the capability and flexibility to scale and indeed support this new wave of needs within the DC.

Stranded Space:

One of the other issues in HPC for AI is the Data Center space or ‘stranded’ space HPC for AI causes. When current traditional facilities try to shoehorn in this new level of HPC infrastructure it leaves a huge void in a DC when it’s been designed to support such low density requirements. Even those that used a little more advanced infrastructure such as in-direct cooling that allows for 30kW per rack, in my opinion is still not enough, when we already see 60kW+ as a standard request. We have to think differently in regards to our approach to cool but importantly maximise the space that’s available while having the ability to power whatever is requested.

Facility Challenges:

Traditional data center facilities have never been designed to support HPC systems and were only built with the assumption of power requirements ranging from 5-8kW per rack. Even this was not enough and is why the emergence of aisle containment was introduced into deployments….. In the current AI-driven era, this is no longer sufficient. HPC deployments for AI, demand facilities that can achieve up to 80kW per rack or even higher. We are now starting to see a new wave of advanced Data Centers where these sites are capable of any capacity or type of deployment HPC requires.

Conclusion:

As AI continues to redefine the possibilities of HPC, the industry must confront the power problem at the core of these advancements. The traditional cooling-centric mindset is no longer sufficient. With technologies like RDHx leading the charge, the focus shifts to providing the necessary power infrastructure to support the growing demands of AI-driven HPC deployments. Facilities, rack footprints, and power supply mechanisms must evolve to meet this new reality, ensuring that the power problem is not a barrier but rather a gateway to unlocking the full potential of AI in HPC.

Chad Cape

Conserving Energy, Water, Land/Space & Rare Earths. Your Capital Too. Business Development @ LiquidCool Solutions

9 个月

Great insights and interesting perspective. The Power Problem can be an even bigger Water Problem, as Charles Fishman (author of The Big Thirst) and his friends at The Water Council know well.

回复
William Ringer

Regional Practice Leader @ HKS, Inc. | Green Data Centers (centres) , Mission Critical

9 个月

The world has changed, and most don’t understand that the % of power used to support IT from the 80’s has greatly changed. A mega campus 10 years ago was 96MW now that is a single building. Clearly the % of energy data centres use, will increase as a national/ global %. Unless we don’t want AI/ mobile phones or to WFH….

Poornachandra BE, MCIPS, MIET

EMEA Procurement Lead for sourcing Construction partners across GC, Power( HV SS), Automation , Security and Bespoke OFCI. Responsible for market test till delivery managing a team of 20+ direct reports, reporting to VP.

9 个月

Well explained, real issue

回复
Jim Paterson

Senior Executive | 25 Years of IT Services Leadership | Expertise in Cloud, SaaS, and Managed Services

9 个月

Gary Tinkler yet another in a series of great articles and posts coming out describing the power challenge. Interesting question, what is the future demand for the 5-8 kW/rack data center, given the lack of utility into the site along with its legacy power and cooling infrastructure?

回复

Very instructive Gary Tinkler ?? ! Let's turn the power problem into an opportunity, and support our customers in designing sustainable and scalable Data centers ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了