How have we reached the End of Air Cooling in Servers? (1)

How have we reached the End of Air Cooling in Servers? (1)

It has always been a wise approach to understanding the basics behind the facts. This can help overcome operational obstacles and uncover innovative solutions.

Air Cooled CPU and GPU Silicons

The CPU and GPU blocks generate the most heat in servers. The power produced by the processor's core needs to be dissipated to the heat sink before the temperature reaches its maximum. At first, heat travels from the surface of the core to the heat sink through conduction across the interfaces.

Thermal interface material (TIM) is typically used to fill the gaps between components and the heat sink, thereby reducing temperature disparities across heat transfer surfaces. Generally, about 10 to 20% of the heat is transferred to the board.

The provided temperature values are indicative and may vary based on the application and chip density. A standard guideline is calculating the required heat sink and airflow to maintain the junction temperature below 125°C for dependable operation. Exceeding this limit can lead to performance degradation and potential damage.

Heat Removal from Server

The fans inside the server chassis remove the heat produced in the regular servers. If the total heat is transferred to the data hole the critical temperatures are not exceeded and the server will never tend to shut down by thermal protection.

The heat removed by airflow is practically calculated with the formula

Q=m˙?cpT

where:

  • Q is the rate of heat removal (in watts, W).
  • m˙ is the mass flow rate of the air (in kilograms per second, kg/s).
  • cp is the specific heat capacity of the air (in joules per kilogram per degree Celsius, J/kg·°C).
  • ΔT is the temperature difference between the inlet and outlet air (in degrees Celsius, °C).

Since the air properties are known and do not change significantly in the range the heat capacity removed through the air cooling strongly depends on ΔT, temperature difference between the inlet and outlet air.

A typical server with peak performance and normal configuration is designed to operate at ΔT = 11-12 °C. These temperature differences correspond to 270 - 250 m3/h per kW. An air-cooled server with 1 kW power consumption needs to have 270 - 250 m3/h airflow for healthy operation. A typical blade server with peak performance and normal configuration can operate with low airflow (200 - 187 m3/h per kW) but higher ΔT (15-16 °C).

What is the upper limit of ΔT?

Both typical and Blade servers can be configured to operate for higher ΔT in a tolerance so CPU/GPU remains secure sacrificing the expected life. If the server components were not installed densely, if the heatsinks were dimensioned largely, still high ΔT could be maintained without early aging. Blade servers have used such structural advantages to increase the higher ΔT than regular servers.

Besides the server's expected life concerns, there is another practical limit. ΔT = 16 °C gives an outlet temperature of 38 °C when the inlet temperature is 22 °C. So the components installed at the back of the rack should operate at this temperature with no degradation or risk. Those practical limits have been highlighted by ASHRAE already.

The giant internet technology providers have user-specific server hardware design and rack configuration so outlet server temperatures exceed these limits. While they benefit from high PUE, personnel are prohibited from spending extended periods in hot rack aisles due to the potential harm high temperatures can cause to the human body.

Fan Airflow Capacity

Fans generate airflow by rotating blades, which create a pressure difference between the air intake and exhaust within the housing. The quantity of airflow is affected by variables including dimensions, shape, rotational speed, and voltage. Fans are rated based on their airflow and pressurization capabilities; however, these values are challenging to attain. The specified airflow can only be reached when the fan operates in the open air, and the pressure difference is measurable solely when the fan's airflow is entirely obstructed, as illustrated in the main article picture at the top.

The graphic below illustrates how the effective fan curve varies with different fan deployment strategies. Arranging fans in parallel boosts airflow capacity, whereas cascading them augments the pressure difference. The operating point is determined by the intersection of the fan curve and the air resistance, which rises parabolically with increased airflow. In a chassis packed with CPUs, GPUs, heatsinks, and other obstructive components, the design of the fan array becomes increasingly critical.

Using a server chassis frame with a cross-section designed for parallel fan arrays is optimal for maximizing airflow. When the chassis is filled with components, cascading the fans can increase air pressure to overcome air resistance.

Following this crucial information, one might wonder how NVIDIA, the leading provider of artificial intelligence (AI) hardware and software, is transitioning from air cooling to introducing liquid-cooled servers. The analysis will be thoroughly explained in the upcoming article.


Data Center Cooling Optimization Series

Data Center Construction Series






?enol Bey güzel bir ?al??ma, kutlar?m

回复
Jason Parmley

HVAC/Mechanical Certified Expert Witness Forensics Re-Creation National Technical Support Data Center Critical Environment Consultant Over 30 yrs multi-trade experience

5 个月

Put together very well Senol! I’m currently working on a presentation at Carnegie Mellon on the same topic. I will reference your writings and credit you for your mass wisdom. Thank you Senol.

回复

I liked your article.. Thank you very much..I think, every parameter (delta T, heat capactiy and air flow rate) is important in cooling. But I think we can get the biggest savings at the source of the problem, by diminishing the heat dissipated by the chip. for this purpose, we can employ different chip architectures, identify ghost servers, use more virtualization/containerization..

O?uzhan ?ilenk

Data Center Design and Operations Engineer, ATD, AOS

5 个月

Great article! The technical details and explanations you provided do an excellent job of summarizing the limits of air cooling and why we need to look for new solutions. Your clear description of the challenges faced in server cooling is particularly valuable. Looking forward to the next part!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了