Why is it just a Power Grab?

Why is it just a Power Grab?

A special thanks to Marc Cram

Sitting in a conference listening to everyone talk about the unknown, emerging technologies, and the future of AI (Artificial Intelligence) I find myself perplexed. As I listen to some brilliant people, I hear a similar message about the need for more power and density growth in the rack. I found myself chuckling as we go from 20-30kW per rack up to 300kW per rack. All the sudden I see a large colo data hall with 300kW rack density but only have 10 total racks in a 300k sq data hall.

Why is it typically the physical layer is the first thing we look at?

I know it is not that simple, but this conversation seems a little like DeJa’Vu. A quick flashback to over a decade ago and the birth of the data center boom when all data centers had to have 4 or 5 9’s uptime and reliability. It wasn’t until eBay did a large storage analysis of their current data only to find that about 20% of their data was business critical and the other 80% was considered cold data or non-critical. This analysis allowed them to work on a better storage and archiving solution. It also allowed them to address their true data center needs and the level of reliability needed. It paved the way for them to move away from Tier 4 data centers and down to Tier 1 & 2 resulting in huge cost savings and sustainable data center growth.

I bring this story up to point out the simple fact that sometimes we react too quickly to make some complex decisions that require a long-term strategy. So now let’s get back to the topic at hand here as we look at AI. To date, most of the focus has been on increasing power density to the rack level supporting massive parallel processing through GPUs (Graphic Processing Unit). While the GPUs bring more processing power to the rack it also creates a need for more power while sprinkling on some new cooling technologies like liquid to the rack/chip or immersion cooling, etc.

First, I would like to address the tech elephant in the room that is GPUs will not always be the answer. While they are a perfect fit for most LLM (Large Language Models) environments utilized for large hyper-scale applications like ChatGPT, Co-Pilot, and other large AI systems it is not a one-size-fits-all solution. It is also the reason we see the Hyperscalers taking different approaches towards A.I. inferencing. The GPUs use parallel processing which is needed when we are dealing with many inputs that have infinite numbers or variables for computed outputs. Simply put parallel processing is when we run multiple computations or calculations, like with nVidia H100 offering up to 14,952 CUDA cores, simultaneously inside the server. The GPUs use a smaller core processor enabling it to run more parallel threads (or sequences) for computation. However, parallel processing is not always the answer. Smaller purpose-built AI systems or specific applications might not require parallel processing. This is where CPUs (Centralized Processing Units) become a better fit as they utilize larger core processers that use sequential processing. Sequential processing is when the core processes the threads in order usually using parallel computation but usually only in four threads or less. This solution is still a good fit for applications that are not self-aware and handle fewer inputs with limited outputs.?

Secondly how much I/O (input/Output) do we need?

In recent research, one can build an IT system in a single rack with 32 - x86 servers, 64 - 3.1 GHz CPUs, across 576 Cores, dual fabric feeds, and 256GB of memory with only a 14kW draw. The memory can go higher of course but limitations exist in Linux and Windows OS to how long the address line can be. That’s a lot of memory. The RAM (Random Access Memory) is the primary source where the application data is processed in memory. That IT system, in a single rack, can process over 2,100 GT/s (giga transfers per second) and over 187,000 MT/s (mega transfers per second). This means that GT/s represents the number of data transfers or data samples that occur or are captured in a second. Now MT/s is very similar to how we measure MH/z for RAM but instead, it represents the in-memory functionality measuring the RAM speed in terms of the amount of data it can transfer per second. So, with this level of computation in a typical server row of 36 racks we can process over 6.7 million data points in a second only with 20 kW power consumption per rack.

Third where are my propeller heads from Google, Microsoft, and IBM?

I can toot my horn and claim to be a reasonably intelligent person most days, but I understand there are millions of people much smarter than I am. So where is their voice? Why are we not talking about the logical layer more and taking a deeper look into a more efficient computing model that does not require as much power or physical infrastructure? Where is virtualization 2.0? CDN 3.0? When will the software become as efficient as the hardware?

Why are we not looking at the logical layer?

Have we forgotten the huge impact server virtualization had on server farms in earlier times? Why are we not trying to figure out a more efficient and denser compute model that will balance the need for parallel or sequential processing while offsetting the need for insane rack densities? Perhaps it will be the combination of local, cloud, and edge technologies that will allow us to usher in the new era of extreme automation, massive computation, and keeping latency low. Here we can clearly see that still the algorithms, software, and automation layers of A.I. are still lagging in retrospect to a vast increase in power density at the rack level. Until we find ways to minimize cache misses, instruction branches, and memory calls only then can we balance the logical with a physical layer that could reduce the need for power, wait states, cache refreshes, and optimize an A.I. deployment.

Citations and References:

Gutschmidt, C. (2023, August 28). CyrusOne offers 300kW-per-rack #AI #datacenter design! https://www.dhirubhai.net/pulse/cyrusone-offers-300kw-per-rack-ai-datacenter-design-gutschmidt/

NVIDIA H100 Tensor Core GPU Datasheet. (n.d.). NVIDIA.

https://resources.nvidia.com/en-ustensor-core/nvidia-tensor-core-gpu-datasheet?ncid=no-ncid

GPU vs CPU - Difference Between Processing Units - AWS. (n.d.). Amazon Web Services, Inc. https://aws.amazon.com/compare/the-difference-between-gpus-cpus/#:~:text=GPU%20cores%20are%20less%20powerful,important%20role%20in%20parallel%20computing .

Erickson, J. (2024, April 2). What Is AI Inference? https://www.oracle.com/artificial-intelligence/ai-inference/

?

?

Dabhi Vishal

Av senior and Rack box & RLC design technical engineer

3 个月

I agree!

Don Wiggins

Principal Solutions Architect, Global Public Sector

3 个月

Great post Jayson, an ongoing conversation that needs to be had by all in our industry! I think that gains we’ve made with dynamically adaptive networks (SDN, high-capacity Ethernet proliferation, etc.) will enable a more distributed deployment model that eases power constraints at both the municipality and data center level. Sustained, prolific multi-megawatt deployments are both untenable and unnecessary. Will be interesting to see how things develop over the 5 years or so.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了