Considerations for Direct Liquid Cooling: AI and MR loads in Data Centers
AI Generated Image

Considerations for Direct Liquid Cooling: AI and MR loads in Data Centers

Background

The significance of cooling systems within data centers remains a pivotal focus for both the industry and its broader stakeholders. Nearly two decades ago, a critical observation emerged: for every kilowatt of power utilized by IT hardware (comprising servers, storage, and network switches), an additional kilowatt was utilized by cooling systems to maintain optimal operating conditions within data centers. This revelation spurred data center engineers to seek more efficient methods for cooling IT hardware, aiming to reduce power consumption in cooling systems. This pursuit led to the inception of the widely adopted data center efficiency metric known as Power Usage Effectiveness (PUE), introduced by The Green Grid in 2007.

PUE serves as a measure of cooling effectiveness, calculated as a ratio of Total Facilities Energy by the IT Equipment Energywithin a given data center. While Total Facilities Energy encompasses various components such as "house power" and "electrical distribution losses," a significant portion is attributed to the power consumed by cooling systems, in addition to IT systems power.

Cooling Technology Evolution?

In the ever-evolving landscape of data center technology, the development of cooling systems has been driven by a confluence of factors, with two prominent forces at the forefront: Technical Limitations and Cooling Efficiency.

Technical limitations have emerged as a result of the gradual increase in average power densities of IT racks. Beginning modestly at 2-3 kW per rack, these densities have surged dramatically to reach levels as high as 20-25 kW per rack in contemporary data center environments. Traditional room-based cooling systems, which heavily relied on raised floor plenums to distribute cold air, have encountered inefficacies beyond certain rack densities, notably surpassing the 12 kW per rack threshold and flooded cold aisles with hot aisle containment returns at 20 kW per rack. In response, data center engineers have been compelled to explore alternative cooling methodologies. These include closed-coupled cooling systems like In-row cooling and rear-door heat exchangers, tailored for power densities ranging between 20-40 kW per rack. Additionally, for even higher densities exceeding 40 kW per rack, direct liquid cooling (DLC) systems have emerged as a viable solution. It is imperative to note that the delineation of rack power densities provided herein serves as a general guideline rather than rigid boundaries, facilitating a comprehensive understanding of the evolving cooling landscape within data centers.

DLC Opportunities

Cooling efficiency is a critical factor in reducing energy costs and operational expenses (OPEX) within data centers, while also playing a pivotal role in advancing sustainability objectives and mitigating carbon emissions for organizations. Direct liquid cooling presents a compelling solution, offering superior efficiency and reduced power consumption, thereby lowering both OPEX and Power Usage Effectiveness (PUE).

The heigh temperature differential inherent in direct liquid cooling systems renders them particularly advantageous compared to air-cooled systems, especially in applications involving heat recovery and heat reuse. This capability not only enhances operational efficiency but also facilitates organizations in advancing their Environmental, Social, and Governance (ESG) goals, aligning with broader sustainability initiatives.

DLC Types

The market offers a variety of direct liquid cooling (DLC) technologies from various companies, which can broadly be categorized into two main types: Immersion Cooling and Direct-to-Chip Cooling.

Immersion Cooling involves fully submerging IT hardware into a dielectric liquid bath, as the name suggests. Heat generated by the components is dissipated through convection as the dielectric liquid circulates over them. Dielectric liquids are typically categorized into two main groups: single-phase and two-phase. Single-phase liquids remain in a liquid state when in contact with heated components and often necessitate the use of pumps to facilitate liquid flow over the components. On the other hand, two-phase liquids undergo partial phase change, converting to vapors upon contact with heated components. This unique characteristic enables two-phase liquids to often flow over heated components without the need for pumps.

Direct-to-Chip cooling systems employ cold plates positioned directly on the CPU and GPU to dissipate heat into liquids circulating through them. These liquids, commonly dielectric fluids although sometimes a mixture of water and glycol, absorb the heat from the cold plates. A typical direct-to-chip cooling setup consists of several components: cold plates serving as heat sinks for the CPU or GPU, a pair of manifolds connecting to the cold plates, a cooling distribution unit (CDU), a heat exchanger facilitating heat transfer between primary and secondary circuits, and heat rejection devices such as dry coolers, cooling towers, or chillers. Direct-to-Chip cooling generally removes 80-85% heat of servers, remaining 15-20% still requires some form air cooling.?

DLC Challenges

Immersion cooling necessitates thorough material compatibility assessments between the components of IT hardware and the chemicals employed in producing dielectric liquids. Verifying compatibility across thousands of components used in IT hardware manufacturing with various chemicals used in dielectric fluid production presents a formidable challenge. To address this challenge, the industry is actively engaged in developing partnerships primarily between server manufacturers, immersion cooling system providers, and dielectric liquid cooling suppliers. However, the options are relatively limited compared to the flexibility offered by air-cooled systems, where a single IT rack can accommodate hardware from dozens of different manufacturers, owing to the constraints imposed by these partnerships.

One limitation of Direct-to-Chip cooling systems is their exclusive focus on removing heat from the CPU and GPU, leaving the cooling of other components of IT hardware to air-cooled systems.

In two-phase liquids, the phase change process leads to micro-cavitation caused by the phenomenon of evaporation. This micro-cavitation has the potential to erode the metal in IT hardware and cooling system components, which can result in malfunctions or failures of the IT hardware and deterioration of dielectric characteristics.?

Environmental Challenge

Some of the two-phase liquids uses PFAS (poly-fluorinated alkyl substance) or forever chemicals that can take over 1000 years to degrade. Environmental protection agency (EPA, US) proposed designating some PFAS chemicals as hazardous substances few years ago. This creates additional challenges for adaptation of two-phase based direct liquid cooling systems for data center owners and operators while keeping up their commitment to ESG goals and ensuring health and safety of their employees.

Operational Challenges

Regardless of the specific type of Direct Liquid Cooling (DLC) systems implemented, the widespread deployment of DLC systems in data centers necessitates the upskilling of operations teams. Typically, operations teams are proficient in managing power and network infrastructure within the white space of data centers. However, the deployment of DLC systems requires operations teams to acquire skills in safely handling liquids to ensure the well-being of personnel and equipment alike.

Conclusion

In conclusion, direct liquid cooling (DLC) presents a promising solution for surmounting technical cooling barriers associated with high-density applications while simultaneously reducing Power Usage Effectiveness (PUE), carbon emissions, and operational expenses (OPEX), thereby advancing energy efficiency and Environmental, Social, and Governance (ESG) goals. However, the challenges outlined above must be addressed before DLC can be widely adopted in data centers.

The mass-scale adoption of DLC in data centers necessitates collaborative efforts among various stakeholders, including regulators, owners and operators, IT vendors, cooling vendors, and dielectric liquid manufacturers. Such collaboration is essential for sharing performance data, facilitating continuous development, and driving improvements in DLC technology. Only through concerted action and information sharing can the full potential of DLC be realized in the data center industry.

Vijay Sampathkumar

Country Manager, India & South Asia | Shaping AI Factories with Sustainable, Waterless Cooling Solutions at ZutaCore

8 个月

Dear Muhammad Naveed Saeed, Thank you for sharing your insightful article on direct liquid cooling in data centers, focusing on AI and MR workloads. Your analysis provides valuable insights into the evolving landscape of cooling technology. At ZutaCore, we commend your thorough exploration of the opportunities and challenges associated with direct liquid cooling systems. Your emphasis on energy efficiency, sustainability, and operational considerations aligns with our commitment to revolutionizing data center cooling. We appreciate your efforts in fostering collaboration and knowledge sharing within the industry. Your work inspires us to continue advancing cooling technology to drive sustainability and efficiency in data centers. Thank you for your valuable contribution, and we look forward to further engagement. Warm regards, Vijay Sampathkumar ZutaCore.

Hariharan Venkatakrishnan

HVAC Consultant, Accredited TIER Designer, Data Center Cooling Consultant.

8 个月

A comprehensive document covering many aspects including the environmental side. Often this is missed out. Had one question. How do we take care of redundancies when the cooling shifts to the rack or server level? As we move ahead trying to improve efficiency, we should keep an eye on reliability too. Hence the question.

回复
Tabish Syed

Critical Environment Engineering Manager

8 个月

Good read.

Ramesh Rajendhiran - C.Eng(India) , CDCP, DCCA

Technical Excellence Center Leader - MEA at Schneider Electric

8 个月

Thanks Naveed Bhai to provide such great insights on DLC for AI infrastructure ??

Hi, I came across your profile and noticed your interest in technology & innovation industry. I wanted to reach out and inform you about an exciting event coming up that I believe would be of great interest to you – the 2nd Indonesia Technology and Innovation Exhibition happening from August 12th to 14th, 2024, at Jakarta International Expo, Indonesia. With a focus on Internet & Telecommunication, Digital Technology, Artificial Intelligence, Data Center & Cloud, Cybersecurity, and many other cutting-edge sectors, our exhibition promises to be a hub of innovation and collaboration. It's not just an opportunity for Indonesians but also for professionals from around the world to network, learn, and explore the latest advancements in technology. I believe your expertise and passion would be a valuable addition to our event. I encourage you to visit our website at www.inti.asia or check out our LinkedIn page at https://www.dhirubhai.net/company/indonesia-technology-and-innovation/ for more information and consider joining us at the exhibition. Please feel free to reach out if you have any questions or would like further details. Looking forward to the possibility of your participation!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了