The Hidden Trap of GPU-Based AI: The Von Neumann Bottleneck
Dr. Eric Woodell
World's #1 expert in data center resilience. I audit and certify colocation facilities, ensuring secure, continuous operations—insured by Lloyd's of London.
At this risk of sounding provocative, I will begin by stating for the record, that GPU-based AI systems are doomed to failure.?
I’ve already explained why this is true in “AI Boom to Doom: The Dark Side of Data Center Proliferation.”? It boils down to finite amounts of power and water available, simple as that.? Data centers are notoriously expensive in terms of power and water consumption, and GPU-based AI systems consume 5-10 times more for a given footprint.? Worse, the power densities that GPU-based AI requires- the amount of power consumed in a single computer rack- goes from a maximum of 20 kilowatts (kW) per rack, up to 100 kW per rack.
Put in perspective, the average American household requires 1.2 kW.? A computer rack is roughly the same footprint as a double-door refrigerator, going all the way up to the ceiling, consuming power for ~17 homes.? The waste byproduct for all that computing power is heat.? From experience I can tell you that when a computer rack is loaded with 20 kW of IT equipment, you’re at the ragged edge of being able to adequately cool that equipment so it won’t overheat and die.?
GPU-based AI systems are at least five times worse than typical IT equipment, due to how they’re made and operate.? In this case, a single computer rack consumes more power than 80 American homes, and (again) the waste byproduct of that power usage is heat.?? FAR more heat than air cooling systems can use, requiring the necessitation of new cooling strategies utilizing direct liquid cooling, or DLC.? These systems have a variety of problems on their own, and as this brilliant analysis explains, their effectiveness in real-world applications is far from certain.
If you recognize the absurdity of this situation- at a time when grid reliability is no longer a certainty- then let me say “congratulations,” common-sense is still alive and well. ?
The Von Neumann Architecture
Now that we know that IT assets- and AI assets in particular- are energy hogs, we need to ask WHY??
The glib answer would be “physics,” but there’s far more to it than that.? WHY are physics the limiting factor?? The answer to that question goes all the way back to 1945, when John Von Neumann developed the protocols for how computer processors handled data.
Forgive me for copying and pasting from the Wikipedia description, but it explains the situation overall:
The?von Neumann architecture—also known as the?von Neumann model?or?Princeton architecture—is a?computer architecture?based on a 1945 description by?John von Neumann, and by others, in the?First Draft of a Report on the EDVAC.[1]?The document describes a design architecture for an electronic?digital computer?with these components:
·???????? A?processing unit?with both an?arithmetic logic unit?and?processor registers
·???????? A?control unit?that includes an?instruction register?and a?program counter
·???????? Memory?that stores?data?and?instructions
·???????? External?mass storage
·???????? Input and output?mechanisms[1][2]
The term "von Neumann architecture" has evolved to refer to any?stored-program computer?in which an?instruction fetch?and a data operation cannot occur at the same time (since they share a common?bus). This is referred to as the?von Neumann bottleneck, which often limits the performance of the corresponding system.[3]umann architecture" has evolved to refer to any?stored-program computer?in which an?instruction fetch?and a data operation cannot occur at the same time (since they share a common?bus). This is referred to as the?von Neumann bottleneck, which often limits the performance of the corresponding system.[3]
That sounds all fine and well, but what does it really mean?? The Von Neumann architecture was designed to assist programmers in writing code, at the expense of efficiency in the processor.? At that time, a processor was predicted to be able to perform up to 20,000 operations per second, a phenomenal speed for the time!? So there was no worry about efficiency of the processor, it was so fast it wouldn’t matter, especially for the limited amount of data that was being fed to the processor.? But now…?? Now the sheer volume of data is choking the whole system.
To use an analogy, imagine you’re at an intersection in a small town, with minimal traffic controlled by a police officer with a stop sign, signaling drivers to stop or go as the circumstances require.? It’s not ideal by any means, but it’s do-able.?
Now apply that method in a large city, say Chicago, for example.? The intersection now has multiple lanes going each direction, there are LOTS of cars, and the traffic is controlled by the same police officer with a stop-sign, directing traffic.? And he’s not terribly bright, so he applies the same rules he used in the small town, allowing only one car proceed through the intersection at a time.? What would be the result of this?? A massive traffic-jam.
Similarly, every cycle of the processor is that policeman letting through a single bit of information, or fetching an instruction, then pulling data from the memory unit, then sending out a single bit of information to the output device (such as your monitor), then fetching another instruction, pulling data, then another bit to the output device, and so on.
Moore’s Law Is Indeed Dead, But NOT For the Reasons You Think
Up until recently, the limitations of the Von Neumann architecture haven’t been a significant problem.? We can increase the speed of the processors, as demonstrated by Moore’s Law, which states the number of transistors in an integrated circuit doubles every two years, effectively doubling processor speed every two years.? But Moore’s Law is now dead, as (ironically) pronounced by Jensen Huang, cofounder and CEO of Nvidia, in September 2022.?
That is to say, doubling the number of transistors every two years is not doubling the speed, to keep up with computing demands.?
Moore’s Law is indeed dead, but not due to the physical limitations of integrated circuits, but because of the timing constraints inherent to the Von Neumann bottleneck.
The bottleneck is the constraint that killed Moore’s Law, it’s really that simple.?
Let me restate this:? You can increase the speed of the processor, but the time required to handle fetch commands, output commands, input commands and individual bits of data one-at-a-time…?? The speed of those individual bits of data moving around on a common bus becomes a physical limitation that cannot be overcome, using the Von Neumann architecture.
Ironically, it is Nvidia who is now the leader increasing processing speed, by simply cramming multiple Graphical Processing Units- GPUs- onto computer circuit boards, dividing up the computation loads between them on multiple busses, offering a ham-fisted solution to deliver enough processing power for Artificial Intelligence- AI.?
In essence, what Nvidia couldn’t accomplish with a single sledgehammer to smash through the limitations imposed by the Von Neumann architecture, they decided to smash with EIGHT sledgehammers, instead.? The result is massive power consumption, heat generation and correspondingly, cooling requirements.? It’s a crude approach that maximizes short-term monetary gains, but it’s doomed to failure because of the massive power and cooling requirements.
领英推荐
Rethinking Antiquated Conventions
EVERY manufacturer, every builder, every creator of anything meaningful, abides by certain conventions of what’s acceptable within their particular space.?
In the same way, computer chip manufacturers assume the Von Neumann protocol is the standard (if they are even aware of the protocol), and work within other industry standards for size, power source voltage, maximum temperature, and so on.? They’re not in the business of changing the Von Neumann protocol.?
Similarly, software engineers operate within established confines for their industry, IT end-users operate within their established confines, etc., etc.
In other words, nobody inside industry has worked at, much less seriously considered, challenging the Von Neumann architecture.? There are attempts at quantum computing, where the environmental requirements are even more absurd than the Nvidia solutions...? And then there is Neuromorphic computing.
Neuromorphic computing attempts to mimic how the organic brain functions, instead of the one-bit-at-a-time approach of the Von Neumann architecture.? And this is where the real solution to viable AI lies.
Enter I/ONX
I/ONX is a company that reached out to me for my input.?
Their approach was to throw out the Von Neumann architecture and build a completely new instruction set for computer processors called Kore, which eliminates the time limitations imposed by the one-bit-at-a-time approach.
To go back to the analogy of the traffic officer in Chicago, they’ve replaced the single stop-sign, with a normal traffic-light setup.? For each change of the light, tens or hundreds of cars can move across an intersection, and there’s no more traffic jam.? Similarly, each cycle of the processor now lets through a stream of data, then with the next cycle of the processor, fetch commands, then the next cycle of the processor, data output streams, and so on.
THIS is solution to the Von Neumann bottleneck.
But then I/ONX combines Kore software with hardware specifically designed to capitalize on the efficiencies gathered by the software, resulting in an AI computing solution that matches speed with the best in the industry, while consuming ~10% of the power, and generating around 2-3% of the waste byproduct, heat.
The end result is that while Nvidia can stuff 40 cores into a computer rack weighing some 1500 pounds, drawing 50 kW of power and requiring direct liquid cooling, I/ONX appears poised to deliver an alternative solution that can match the same compute capabilities on 2 cards the size of HP Blades, consuming a mere 200 watts of power, and generating essentially no heat.
Put another way, a fully populated computer rack with I/ONX hardware will be able to deliver 20x more compute capability than a fully populated Nvidia rack while consuming <6 kW of power and creating (again) essentially no heat.? In other words, the efficiency of their approach seems so good that they’ve eliminated the need for any cooling system.? And it would weigh in around 1200 pounds, 20% less!
Can This Be Real???
Now, at this point, I know many others in the industry will say this is complete nonsense; such things as I describe are impossible!? ?
Umm... Yeah.?
That’s EXACTLY what I thought, when I/ONX first approached me.? In effect, the things they were claiming simply weren’t credible, couldn’t POSSIBLY be true...?
What I’ve learned over the past few months has completely upended everything I’ve seen in the critical facilities space and forced me to re-evaluate every rule we’ve always accepted as fact.? I’ve done the calculations myself, examined their approach from every way I can…? and the numbers track.?
The implications for this cannot be understated; a sustainable approach to AI, to enterprise IT computing, even to consumer electronics such as cell phones and laptops, all made possible by throwing out the antiquated standard that has held back the industry.?
Let me say it plainly: GPU-based AI is already a zombie ecosystem; it’s dead, it just doesn't know it yet.
What’s coming, I believe, is as big of a leap forward for IT, as going from the horses and buggy-whips to modern cars.?
My Predictions for the Data Center Industry
As this plays out, I predict the following:
Nobody can put the AI genie back in the bottle; it’s here, and in its current form it’s big, bad, UGLY.? And GPU-based AI is not sustainable.
The only viable alternative that I have seen so far, is the technology being developed by I/ONX.
Keep an eye on them!