登录查看更多内容

How have we reached the End of Air Cooling in Servers? (1)

Senol Gurvit

Senior Data Center Infrastructure and Operation Consultant

发布日期: 2024年6月27日

It has always been a wise approach to understanding the basics behind the facts. This can help overcome operational obstacles and uncover innovative solutions.

Air Cooled CPU and GPU Silicons

The CPU and GPU blocks generate the most heat in servers. The power produced by the processor's core needs to be dissipated to the heat sink before the temperature reaches its maximum. At first, heat travels from the surface of the core to the heat sink through conduction across the interfaces.

Thermal interface material (TIM) is typically used to fill the gaps between components and the heat sink, thereby reducing temperature disparities across heat transfer surfaces. Generally, about 10 to 20% of the heat is transferred to the board.

The provided temperature values are indicative and may vary based on the application and chip density. A standard guideline is calculating the required heat sink and airflow to maintain the junction temperature below 125°C for dependable operation. Exceeding this limit can lead to performance degradation and potential damage.

Heat Removal from Server

The fans inside the server chassis remove the heat produced in the regular servers. If the total heat is transferred to the data hole the critical temperatures are not exceeded and the server will never tend to shut down by thermal protection.

The heat removed by airflow is practically calculated with the formula

Q=m˙?cp?ΔT

where:

Q is the rate of heat removal (in watts, W).
m˙ is the mass flow rate of the air (in kilograms per second, kg/s).
cp is the specific heat capacity of the air (in joules per kilogram per degree Celsius, J/kg·°C).
ΔT is the temperature difference between the inlet and outlet air (in degrees Celsius, °C).

Since the air properties are known and do not change significantly in the range the heat capacity removed through the air cooling strongly depends on ΔT, temperature difference between the inlet and outlet air.

A typical server with peak performance and normal configuration is designed to operate at ΔT = 11-12 °C. These temperature differences correspond to 270 - 250 m3/h per kW. An air-cooled server with 1 kW power consumption needs to have 270 - 250 m3/h airflow for healthy operation. A typical blade server with peak performance and normal configuration can operate with low airflow (200 - 187 m3/h per kW) but higher ΔT (15-16 °C).

What is the upper limit of ΔT?

Both typical and Blade servers can be configured to operate for higher ΔT in a tolerance so CPU/GPU remains secure sacrificing the expected life. If the server components were not installed densely, if the heatsinks were dimensioned largely, still high ΔT could be maintained without early aging. Blade servers have used such structural advantages to increase the higher ΔT than regular servers.

Besides the server's expected life concerns, there is another practical limit. ΔT = 16 °C gives an outlet temperature of 38 °C when the inlet temperature is 22 °C. So the components installed at the back of the rack should operate at this temperature with no degradation or risk. Those practical limits have been highlighted by ASHRAE already.

StorageReview.com 5 个月前

Liquid Cooling in the Lab, Mini NVMe RAID, New Dell…

StorageReview.com 6 个月前

vSAN, Dell Hyperscale and Solidigm 2PB Videos, Big…

StorageReview.com 9 个月前

The giant internet technology providers have user-specific server hardware design and rack configuration so outlet server temperatures exceed these limits. While they benefit from high PUE, personnel are prohibited from spending extended periods in hot rack aisles due to the potential harm high temperatures can cause to the human body.

Fan Airflow Capacity

Fans generate airflow by rotating blades, which create a pressure difference between the air intake and exhaust within the housing. The quantity of airflow is affected by variables including dimensions, shape, rotational speed, and voltage. Fans are rated based on their airflow and pressurization capabilities; however, these values are challenging to attain. The specified airflow can only be reached when the fan operates in the open air, and the pressure difference is measurable solely when the fan's airflow is entirely obstructed, as illustrated in the main article picture at the top.

The graphic below illustrates how the effective fan curve varies with different fan deployment strategies. Arranging fans in parallel boosts airflow capacity, whereas cascading them augments the pressure difference. The operating point is determined by the intersection of the fan curve and the air resistance, which rises parabolically with increased airflow. In a chassis packed with CPUs, GPUs, heatsinks, and other obstructive components, the design of the fan array becomes increasingly critical.

Using a server chassis frame with a cross-section designed for parallel fan arrays is optimal for maximizing airflow. When the chassis is filled with components, cascading the fans can increase air pressure to overcome air resistance.

Following this crucial information, one might wonder how NVIDIA, the leading provider of artificial intelligence (AI) hardware and software, is transitioning from air cooling to introducing liquid-cooled servers. The analysis will be thoroughly explained in the upcoming article.

Data Center Cooling Optimization Series

Data Center Construction Series

Testing and Commissioning in Data Center Facilities

Data Center Insight

2,656 位关注者

Ercument Buyuksumnulu

5 个月

?enol Bey güzel bir ?al??ma, kutlar?m

Jason Parmley

HVAC/Mechanical Certified Expert Witness Forensics Re-Creation National Technical Support Data Center Critical Environment Consultant Over 30 yrs multi-trade experience

5 个月

Put together very well Senol! I’m currently working on a presentation at Carnegie Mellon on the same topic. I will reference your writings and credit you for your mass wisdom. Thank you Senol.

Mehmet Selcuk Karaca

5 个月

I liked your article.. Thank you very much..I think, every parameter (delta T, heat capactiy and air flow rate) is important in cooling. But I think we can get the biggest savings at the source of the problem, by diminishing the heat dissipated by the chip. for this purpose, we can employ different chip architectures, identify ghost servers, use more virtualization/containerization..

1 次回应

O?uzhan ?ilenk

Data Center Design and Operations Engineer, ATD, AOS

5 个月

Great article! The technical details and explanations you provided do an excellent job of summarizing the limits of air cooling and why we need to look for new solutions. Your clear description of the challenges faced in server cooling is particularly valuable. Looking forward to the next part!

查看更多评论

要查看或添加评论，请登录

查看全部

How have we reached the End of Air Cooling in Servers? (1)

Senol Gurvit

Senior Data Center Infrastructure and Operation Consultant

Air Cooled CPU and GPU Silicons

Heat Removal from Server

What is the upper limit of ΔT?

领英推荐

Fan Airflow Capacity

Data Center Cooling Optimization Series

Data Center Construction Series

Data Center Insight

2,656 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Of Dials and Switches -- Part II: More about Tunables

High‐Efficient Compute Server Clusters

Cisco UCS M8 Servers with AMD EPYC? Establish World Record Benchmarks in Performance, Core Density, and Power Efficiency

DDR5 Memory: Coming Soon To A Server Near You

Inside the Core: Exploring Server Processor Technology

StorageReview Weekly Newsletter

NVMe 101: Choosing the Right Form Factor

PCIe 4.0 x16 Dual QSFP28 Port 100G Server Adapter: Detailed Product Overview

My crazy homelab-server build, an up to 1 TB RAM, up to 24 Petabyte storage and virtualization monster for AI testing.

HPE PROLIANT DL160 GEN10 REVIEW: UNIVERSAL ENTRY-LEVEL SERVER FOR THE SMB SEGMENT

Air Cooled CPU and GPU Silicons

Heat Removal from Server

What is the upper limit of ΔT?

领英推荐

Fan Airflow Capacity

Data Center Cooling Optimization Series

Data Center Construction Series

Data Center Insight

2,656 位关注者

How Reliable is Your Colocation Service?

2024年9月17日

How have we reached the End of Air Cooling in Servers? (2)

2024年8月20日

Testing and Commissioning in Data Center Facilities

2024年4月4日

How Confident are you in the Thermal Stability of your Data Center?

2024年2月22日

Improving the Efficiency of DC Cooling Systems? (2)

2024年2月13日

Improving the Efficiency of DC Cooling Systems? (1)

2024年1月31日

What are the Advantages of Using DC Aisle Containment Systems? Do they improve Energy Efficiency?

2024年1月24日

Veri Merkezi so?utma sistemlerini daha verimli kullanmak i?in neler yap?labilir? (2)

2023年11月12日

How Reliable is Your Colocation Service?

2023年10月25日

The Secret Heroes of Data Center Facilities

2023年10月17日

社区洞察

其他会员也浏览了

Of Dials and Switches -- Part II: More about Tunables

High‐Efficient Compute Server Clusters

Cisco UCS M8 Servers with AMD EPYC? Establish World Record Benchmarks in Performance, Core Density, and Power Efficiency

DDR5 Memory: Coming Soon To A Server Near You

Inside the Core: Exploring Server Processor Technology

StorageReview Weekly Newsletter

NVMe 101: Choosing the Right Form Factor

PCIe 4.0 x16 Dual QSFP28 Port 100G Server Adapter: Detailed Product Overview

My crazy homelab-server build, an up to 1 TB RAM, up to 24 Petabyte storage and virtualization monster for AI testing.

HPE PROLIANT DL160 GEN10 REVIEW: UNIVERSAL ENTRY-LEVEL SERVER FOR THE SMB SEGMENT