ASHRAE in Chicago
Chicago Beach In January

ASHRAE in Chicago

I decided not to go for a swim after ASHRAE, since the beach was solid ice. Everyone was talking about liquid cooling at the data center meetings there. I am a member of SSPC 127, Method of test for data center cooling equipment. We are working on a method of test for CDUs, so that it will be easier to compare different manufacturer's models. ASHRAE test methods feed into ASHRAE standards that are often referenced in efficiency regulations. ?We discussed:

·???????? Traditional data center cooling companies need to work with server vendors to ensure that systems and materials are compatible.

·???????? Systems need to be concurrently maintainable. (Our systems with N+1 CDUs with automatic transfer valves and/or backup air cooling have met this goal since 2011)

·???????? Data center design and modeling needs to be updated to deal with high power direct liquid cooled racks that still put out lots of heated air.

·???????? The heat capture ratio depends on the coolant and air temperatures, so be ready for servers that have coolant and air temperature specifications.

·???????? Transient operation of liquid cooled racks needs to be understood. Liquid cooled time constants are only seconds.

·???????? As liquid cooling transitions from HPC to mainstream, the balance between cost, efficiency and uptime for different customers needs to be developed.

·???????? Putting all the chillers on the roof probably won't work for high density data centers, as the hot air leaving one gets sucked into the intake of the next.

·???????? A 100 kW rack with 60-80% heat capture still puts our 40-20 KW of hot air, so containment and rear doors may be necessary.

·???????? The coolant chemistry is still not settled. IBM, Lenovo and Chilldyne use water, Coolit uses PG25. IBM/Lenovo has published their specs. Not all customers will follow directions( Chilldyne has automated coolant quality control)

·???????? The water sometimes grows bacteria, and the PG25 attacks some materials. Automotive coolant lasts for ~2000 hours of operation, data center coolant needs to last ~40,000 hours.

·???????? Heat reuse requires running the CPUs and GPUs near the throttling temperature to get the hottest water but this reduces the ride through time if something goes wrong.

·???????? ASHRAE is coming out with new server coolant temperature standards from 30-50C inlet coolant.

·???????? Dell is recommending 1.5 lpm/kW coolant flow rates (10 C temperature rise). We are using 1 lpm/kW (14 C temperature rise) right now. This drives the cost of the liquid cooling system and the CDU rating. Warmer facility water and higher chip power and or lower chip temperatures will require more flow per kW.

·???????? GPU power will keep going up. Expect 2 KW in a few years.

·???????? A standard server flow impedance would be nice. We recommend 2 psi for each cold plate, all cold plates in parallel. Others like to use series cold plates, but then adding extra cold plates changes the impedance.

·???????? Some vendors may need 17C coolant, so leave room for a chiller in your data center design.

·???????? Next generation memory will probably need liquid cooling.

·???????? Global warming is real, so include margins in your system design to account for it.

·???????? HPC users are ok with some level of downtime. Enterprise users are not. Failure and restart scenarios need to be determined. Resilience is needed.

·???????? Putting hard drives into immersion cooling makes them last longer, due to low vibration (no fans) and steady temperature.

For liquid cooling vendors, deploying liquid cooled clusters at scale is the way to learn what is likely to go wrong, but the hardest part is finding someone to trust their millions of dollars worth of servers to your liquid cooling system. Everyone has heard of disasters. With long lead times and high prices for new GPUs, leakage is not OK. (Chilldyne systems are resilient and failure tolerant, based on experience in aerospace and critical medical devices)

"Remember the words of Margaret Mead, 'Never doubt that a small group of thoughtful, committed citizens can change the world. Indeed, it's the only thing that ever has.' ?? Your notes from the ASHRAE winter meeting could be the catalyst for great environmental change! Speaking of change, have you heard about the upcoming Guinness World Record for Tree Planting event? Here's an opportunity for impactful collaboration! ?? https://bit.ly/TreeGuinnessWorldRecord"

回复

Absolutely love that you're diving deep into learning from the ASHRAE winter meeting! ?? As the inspirational Steve Jobs once said, "The only way to do great work is to love what you do." Your passion and commitment to expanding your knowledge in your field is the first step to achieving greatness. Keep pushing forward! ????

回复
Rich Lappenbusch

Senior Principal @ Supermicro

1 年

Thanks Steve for the write up. We are listening, learning and improving together. Its a long road but we made a lot of progress in 2023.

回复
Tim Shedd

Thermal Strategy @ Dell Technologies

1 年

Steve Harrington You are incorrect. Dell, along with many other people, have converged on 1.5 lpm/kW for PG25 mixtures. The 10 C temperature rise is correct.

要查看或添加评论,请登录

Steve Harrington的更多文章

社区洞察

其他会员也浏览了