Why Data Centers Are Crashing Like Planes—And How to Fix It
Dr. Eric Woodell
World's #1 expert in data center resilience. I audit and certify colocation facilities, ensuring secure, continuous operations—insured by Lloyd's of London.
Yesterday, the Uptime Institute Global Data Center Survey 2024 dropped, and while it would appear at first blush to be stating “not much has changed,” a detailed analysis of the report shows that the industry is still in very deep trouble.
PERFORMANCE
In the past, I’ve tried to apply the same rates of failure in the data center world, to the commercial aviation industry, and the numbers were very alarming. Unfortunately, my math was wrong; it’s far worse than I had estimated.
Here’s the question I plugged into ChatGPT-40.
“From the 2024 Uptime Institute Global Data Center Survey, "Outage frequency and severity,” compare this performance to the commercial aviation industry. If the aviation industry had similar rates of failure, and a "severe" outage from a data center was the equivalent of a catastrophic plane crash, how often would planes crash, while delivering the same performance?”
To compare the performance of data centers to the commercial aviation industry using the analogy that a "severe" data center outage equates to a catastrophic plane crash, let's first establish the equivalent failure rates:
Commercial Aviation Analogy
In the aviation context:
Data Center Outage Statistics (2024 Uptime Institute Survey)
Calculating the Equivalent Aviation Crash Rate
Punch-Line:
I don’t know about you, but with those odds, I’d never fly again; I’d drive.? Heck, I’d walk if I had to!
WHY IS THIS SO?
Power Systems Failures:
From the report, 54% of unplanned outages
And the Uptime report helpfully states “As discussed in previous Uptime Intelligence reports, challenges with electrical grids
This shouldn’t be news to anyone; I’ve been talking to the media about this in November of 2023, and wrote a detailed analysis of the power grid situation, “The Impending Storm: Keeping Data Centers Alive As The Electric Grid Fails.”
The growing risk of electric grid instability puts all enterprise IT organizations
If you bother to read the latest Uptime report, you'll notice how it talks about network and cooling, but does NOT address the #1 cause of data center outages; power.
Why do you think that is?
I’ve written a full analysis of what’s really happening, Shattering the Illusion: Maintenance MIS-Management is the #1 Cause of Data Center Outages.? I strongly recommend you check it out.
In short, colocation providers are fully aware of when deficiencies occur, but they choose to NOT repair them, because any losses which they (the colo provider) may incur due to an outage are merely credits for future use by the client.
领英推荐
They suffer no ACTUAL financial penalties if their client experiences an outage.? It’s a classic “heads I win, tails you lose” scenario.
Colo companies have a financial cost associated with maintaining the critical facilities
So the colo companies divert the money that should be used for maintenance to other activities which artificially boost their stock, and the management can cash out hundreds of millions of dollars in bonuses.
Their IT clients are left utterly exposed to unplanned outages with absolutely zero agency.
“But WAIT a minute!” you say, “Aren’t the colo companies adhering to some sort of oversight, like SOC-2?!?”
Of course they do.? But the SOC-2 certificates aren’t worth the paper they’re not written on.
SOC-2 certificates, like their European ISO equivalents, merely provide a veneer of legitimacy to an otherwise crooked game.
Want proof? Ok, what’s the only requirement to be a SOC-2 auditor??
Answer: you have to be a CPA, a Certified Public Accountant. That's IT.
When I tell IT professionals and executives this, they’re always stunned.? Then they ask, “what does a CPA know about engineering systems?"
The answer, of course, is nothing.?
That’s the point!?
Imagine you’re about to board a jetliner to fly across the ocean, and a CPA comes running up to you and says “don’t worry, I audited the airline, THIS plane is safe!”?
Would you bet on that with your LIFE?? What about your business, your retirement, your future?
Because that’s what it boils down to.
RESULTS
The poor performance cited at the beginning, is the aggregate result of all the crooked, sleazy games being played by colocation data centers (and many service providers serving enterprise IT organizations). The games can be hidden from the individual client, but not the final numbers.
Tier-III sites are designed to have a statistical probability of 1:5,555 of having an outage in a given year. The actual probability of a data center having an outage in any given year is ~1:5.5.
Put simply, data centers are delivering a product 1,000x WORSE than what they promise.
WHAT’S THE SOLUTION?
In contrast to the appalling performance of the data center industry at large, the Amerruss Resilience Program delivered a flawless operational record
In effect, while the rest of the industry delivered an actual uptime of ~83%, our program delivered 100%, with no excuses.?
We can deliver the same results for your company, whether you have owned facilities, lease buildings with 3rd-party vendors maintaining your systems, or depend on colocation facilities for your IT needs (whether a full-on enterprise IT presence, POP or DRaaS sites).? We have the only proven solution to help you regain agency for your IT presence, protect your IT operations and remove the crippling costs of unplanned outages due to power and/or cooling system failures. Our program is scalable, replicable, vendor-neutral, fully transparent and easily auditable.
The costs (especially compared to unplanned outages) are relatively minimal, and SLA results are insurable by Lloyd’s of London.? NO other “audit” program- whether ISO, SOC, anything you can reference- can match this, much less beat it. Additional benefits derived are a decreased need for additional fail-over sites, reducing IT operational costs for personnel and equipment that simply is no longer necessary.
Contact us at www.amerruss.com.? We can help your IT organization be more reliable, AND cost-effective
?
Datacenter Infrastructure Operations Engineer | Mission Critical Load Handling? At WebWerks & Iron Mountain Datacenter. APAC Region ! ???????.
3 个月Great insights?? , The average unplanned Datacenter outage costs from min $9000 to Maximum cost of $2,409,991, the cost of downtime continues to increase year on year .
Infrastructure Services| Datacenter | MBA | Naval Nuclear Power -Veteran
5 个月I propose that DC providers develop an internal audit program?separate from Operations. In the US Navy nuclear power program, this is present as an annual Operational Reactors Safeguards Exam (ORSE). Training reports, Level of knowledge interviews, maintenance reviews, observed evolutions, and so forth.