登录查看更多内容

Why Data Centers Are Crashing Like Planes—And How to Fix It

Dr. Eric Woodell

World's #1 expert in data center resilience. I audit and certify colocation facilities, ensuring secure, continuous operations—insured by Lloyd's of London.

发布日期: 2024年9月10日

Yesterday, the Uptime Institute Global Data Center Survey 2024 dropped, and while it would appear at first blush to be stating “not much has changed,” a detailed analysis of the report shows that the industry is still in very deep trouble.

PERFORMANCE

In the past, I’ve tried to apply the same rates of failure in the data center world, to the commercial aviation industry, and the numbers were very alarming. Unfortunately, my math was wrong; it’s far worse than I had estimated.

Here’s the question I plugged into ChatGPT-40.

“From the 2024 Uptime Institute Global Data Center Survey, "Outage frequency and severity,” compare this performance to the commercial aviation industry. If the aviation industry had similar rates of failure, and a "severe" outage from a data center was the equivalent of a catastrophic plane crash, how often would planes crash, while delivering the same performance?”

To compare the performance of data centers to the commercial aviation industry using the analogy that a "severe" data center outage equates to a catastrophic plane crash, let's first establish the equivalent failure rates:

Commercial Aviation Analogy

In the aviation context:

A "severe" data center outage is equivalent to a catastrophic plane crash.
The failure rate is analyzed over a three-year period.

Data Center Outage Statistics (2024 Uptime Institute Survey)

53% of data centers experienced an outage in the past three years.
3% of those outages were considered "severe" (catastrophic).
6% were "serious."
The remaining outages were classified as significant, minor, or negligible.

Calculating the Equivalent Aviation Crash Rate

Total Data Centers Experiencing an Outage: 53% over three years. Annually, this would be approximately 53%/3 ≈? 17.67% per year.
Severe Outages (Catastrophic Events): 3% of 53% over three years. Annually, this would be approximately [3%×53%]/3 ≈ 0.53% per year.
Commercial Flights: Assume there are roughly 100,000 flights per day globally (a standard estimate for pre-pandemic levels).
Annual Flights: 100,000?flights/day × 365?days/year = 36,500,000?flights/year.
Catastrophic Crash Rate: If 0.53% of flights were to experience a catastrophic event (equivalent to severe outages): [0.53% × 36,500,000] ≈ 193,450?catastrophic?crashes?per?year. Daily, this equates to [193,450 / 365] ≈ 530?crashes?per?day.

Punch-Line:

Aviation Industry Equivalent: If commercial aviation experienced failures at the same rate as data centers, with "severe" outages equating to catastrophic crashes, there would be approximately 530 plane crashes per day.

I don’t know about you, but with those odds, I’d never fly again; I’d drive.? Heck, I’d walk if I had to!

WHY IS THIS SO?

Power Systems Failures:

From the report, 54% of unplanned outages are due to power issues.

And the Uptime report helpfully states “As discussed in previous Uptime Intelligence reports, challenges with electrical grids, exacerbated by aging infrastructure, rising demand, severe weather events and a reliance on intermittent renewable energy sources, may worsen this trend.”?

This shouldn’t be news to anyone; I’ve been talking to the media about this in November of 2023, and wrote a detailed analysis of the power grid situation, “The Impending Storm: Keeping Data Centers Alive As The Electric Grid Fails.”

The growing risk of electric grid instability puts all enterprise IT organizations of EVERY major corporation in the world, at risk.

If you bother to read the latest Uptime report, you'll notice how it talks about network and cooling, but does NOT address the #1 cause of data center outages; power.

Why do you think that is?

I’ve written a full analysis of what’s really happening, Shattering the Illusion: Maintenance MIS-Management is the #1 Cause of Data Center Outages.? I strongly recommend you check it out.

In short, colocation providers are fully aware of when deficiencies occur, but they choose to NOT repair them, because any losses which they (the colo provider) may incur due to an outage are merely credits for future use by the client.

领英推荐

Revolutionising UPS Performance with Modular Systems

Socomec Group 1 年前

Transforming Data Center Connectivity with NTT's…

NTT 1 年前

Week 17 - SSEN smart meter data deep-dive

Centre for AI & Climate 4 个月前

They suffer no ACTUAL financial penalties if their client experiences an outage.? It’s a classic “heads I win, tails you lose” scenario.

Colo companies have a financial cost associated with maintaining the critical facilities, yet they receive no direct benefit.? But, they do enjoy a financial benefit (cost avoidance) if they do NOT maintain their critical facilities, and there is no penalty for NOT maintaining their facilities.

So the colo companies divert the money that should be used for maintenance to other activities which artificially boost their stock, and the management can cash out hundreds of millions of dollars in bonuses.

Their IT clients are left utterly exposed to unplanned outages with absolutely zero agency.

“But WAIT a minute!” you say, “Aren’t the colo companies adhering to some sort of oversight, like SOC-2?!?”

Of course they do.? But the SOC-2 certificates aren’t worth the paper they’re not written on.

SOC-2 certificates, like their European ISO equivalents, merely provide a veneer of legitimacy to an otherwise crooked game.

Want proof? Ok, what’s the only requirement to be a SOC-2 auditor??

Answer: you have to be a CPA, a Certified Public Accountant. That's IT.

When I tell IT professionals and executives this, they’re always stunned.? Then they ask, “what does a CPA know about engineering systems?"

The answer, of course, is nothing.?

That’s the point!?

Imagine you’re about to board a jetliner to fly across the ocean, and a CPA comes running up to you and says “don’t worry, I audited the airline, THIS plane is safe!”?

"Don't worry, I audited the airline. This plane is SAFE!!!"

Would you bet on that with your LIFE?? What about your business, your retirement, your future?

Because that’s what it boils down to.

RESULTS

The poor performance cited at the beginning, is the aggregate result of all the crooked, sleazy games being played by colocation data centers (and many service providers serving enterprise IT organizations). The games can be hidden from the individual client, but not the final numbers.

Tier-III sites are designed to have a statistical probability of 1:5,555 of having an outage in a given year. The actual probability of a data center having an outage in any given year is ~1:5.5.

Put simply, data centers are delivering a product 1,000x WORSE than what they promise.

WHAT’S THE SOLUTION?

In contrast to the appalling performance of the data center industry at large, the Amerruss Resilience Program delivered a flawless operational record for 6 ? years of >60 sites scattered across the globe.? That portfolio included owned, leased and collocated spaces, in a 50-50 mix of Tier-III and Tier-II sites!?

In effect, while the rest of the industry delivered an actual uptime of ~83%, our program delivered 100%, with no excuses.?

We can deliver the same results for your company, whether you have owned facilities, lease buildings with 3rd-party vendors maintaining your systems, or depend on colocation facilities for your IT needs (whether a full-on enterprise IT presence, POP or DRaaS sites).? We have the only proven solution to help you regain agency for your IT presence, protect your IT operations and remove the crippling costs of unplanned outages due to power and/or cooling system failures. Our program is scalable, replicable, vendor-neutral, fully transparent and easily auditable.

The costs (especially compared to unplanned outages) are relatively minimal, and SLA results are insurable by Lloyd’s of London.? NO other “audit” program- whether ISO, SOC, anything you can reference- can match this, much less beat it. Additional benefits derived are a decreased need for additional fail-over sites, reducing IT operational costs for personnel and equipment that simply is no longer necessary.

Contact us at www.amerruss.com.? We can help your IT organization be more reliable, AND cost-effective, without stressing your budget or your nerves!

带有此图标的链接由领英创建，不带此图标的链接由作者添加。

Ajay Varma (CDCP)

Datacenter Infrastructure Operations Engineer | Mission Critical Load Handling? At WebWerks & Iron Mountain Datacenter. APAC Region ! ???????.

3 个月

Great insights?? , The average unplanned Datacenter outage costs from min $9000 to Maximum cost of $2,409,991, the cost of downtime continues to increase year on year .

Chris Hale, MBA

Infrastructure Services| Datacenter | MBA | Naval Nuclear Power -Veteran

5 个月

I propose that DC providers develop an internal audit program?separate from Operations. In the US Navy nuclear power program, this is present as an annual Operational Reactors Safeguards Exam (ORSE). Training reports, Level of knowledge interviews, maintenance reviews, observed evolutions, and so forth.

查看更多评论

要查看或添加评论，请登录

Dr. Eric Woodell的更多文章

Avoiding the Next Equinix: How IT Leaders Can Prevent Catastrophic Data Center Failures

2025年3月7日

Avoiding the Next Equinix: How IT Leaders Can Prevent Catastrophic Data Center Failures

Introduction: The Nightmare Every IT Executive Fears No IT leader wants to wake up to the news that their company’s…
Are You Sure Your Data Center Won’t Fail? How Hidden Risks Could Cost You Millions

2025年2月3日

Are You Sure Your Data Center Won’t Fail? How Hidden Risks Could Cost You Millions

If you’re an IT executive, CFO, or risk manager, ask yourself this: Are you 100% confident your colocation provider…

1 条评论
Popping the Digital Bubble: The Coming Data Center Collapse

2025年1月20日

Popping the Digital Bubble: The Coming Data Center Collapse

The housing market crisis of 2008, immortalized in Michael Lewis’s book and the film The Big Short, serves as a…

8 条评论
The Case for a Fractional Critical Facilities Manager: A Smarter Approach to Managing Colocation Risks

2024年12月5日

The Case for a Fractional Critical Facilities Manager: A Smarter Approach to Managing Colocation Risks

In today's fast-paced digital world, uptime is not just a requirement—it’s a business imperative. IT executives across…
IS Imitation the Sincerest Form of Flattery?

2024年11月8日

IS Imitation the Sincerest Form of Flattery?

Over the past year, I have written extensively about the weaknesses in the data center industry, especially centered…

5 条评论
SOC Certifications: A False Sense of Security

2024年9月23日

SOC Certifications: A False Sense of Security

Here is Chapter 5 of my book, "Mission Critical Facilities Management." SOC-1 and SOC-2 So, you’ve leased space in a…

1 条评论
Your Data Center's Dirty Secret

2024年8月26日

Your Data Center's Dirty Secret

Is your company using colocation services for some- or all- of your data center needs? Did you know that the odds of a…
AI: High Stakes, Low Returns

2024年7月15日

AI: High Stakes, Low Returns

Nvidia GPU-based AI systems are all the rage right now, and Nvidia can’t pump them out fast enough. Give credit where…
Data Centers Unveiled: How Society Hangs by a Thread

2024年6月3日

Data Centers Unveiled: How Society Hangs by a Thread

In a very real way, data centers have now become the digital backbone of our modern economy. Every online transaction…

2 条评论
END the Colo Gamble

2024年5月28日

END the Colo Gamble

If you represent an IT organization that relies on colocation companies for some or all of your business operations…

See all articles

Why Data Centers Are Crashing Like Planes—And How to Fix It

Dr. Eric Woodell

World's #1 expert in data center resilience. I audit and certify colocation facilities, ensuring secure, continuous operations—insured by Lloyd's of London.

PERFORMANCE

WHY IS THIS SO?

Why do you think that is?

领英推荐

That’s the point!?

RESULTS

WHAT’S THE SOLUTION?

Dr. Eric Woodell的更多文章

社区洞察

其他会员也浏览了

Product Spotlight: Synchronizing High-Speed Communication Systems with the Cascade Family of Jitter Cleaners

Revolutionizing Financial Topologies: 100G Ethernet Card Applications and Global Customer Experiences

Enhancing Data Reliability in 25G Ethernet Systems with Reed Solomon Forward Error Correction

What are the key elements of a high-availability design for a financial data center?

SDAP (Service Data Adaptation Protocol) in 5G-NR

RELIABLE OPERATION OF DATA CENTERS IS CRUCIAL FOR BUSINESS

The most abused metric...

What if automation could keep your data flowing?

Data Centers of Tomorrow: Trends, Growth, and Regional Insights in a Rapidly Expanding Market

DCC, DCP and G3, GENSETS and Data Center related basics.

PERFORMANCE

WHY IS THIS SO?

Why do you think that is?

领英推荐

That’s the point!?

RESULTS

WHAT’S THE SOLUTION?

Dr. Eric Woodell的更多文章

Avoiding the Next Equinix: How IT Leaders Can Prevent Catastrophic Data Center Failures

Are You Sure Your Data Center Won’t Fail? How Hidden Risks Could Cost You Millions

Popping the Digital Bubble: The Coming Data Center Collapse

The Case for a Fractional Critical Facilities Manager: A Smarter Approach to Managing Colocation Risks

IS Imitation the Sincerest Form of Flattery?

SOC Certifications: A False Sense of Security

Your Data Center's Dirty Secret

AI: High Stakes, Low Returns

Data Centers Unveiled: How Society Hangs by a Thread

END the Colo Gamble

社区洞察

其他会员也浏览了

Product Spotlight: Synchronizing High-Speed Communication Systems with the Cascade Family of Jitter Cleaners

Revolutionizing Financial Topologies: 100G Ethernet Card Applications and Global Customer Experiences

Enhancing Data Reliability in 25G Ethernet Systems with Reed Solomon Forward Error Correction

What are the key elements of a high-availability design for a financial data center?

SDAP (Service Data Adaptation Protocol) in 5G-NR

RELIABLE OPERATION OF DATA CENTERS IS CRUCIAL FOR BUSINESS

The most abused metric...

What if automation could keep your data flowing?

Data Centers of Tomorrow: Trends, Growth, and Regional Insights in a Rapidly Expanding Market

DCC, DCP and G3, GENSETS and Data Center related basics.