登录查看更多内容

Managing Risks in Software Upgrades: Lessons from the CrowdStrike Outage

Steve Butler

?? Technical Programme Manager | IT Transformation Director | Cloud Migration | Data Centre | Application Modernisation | Cost Optimisation - Delivered ￡30M in Savings | Author | Speaker

发布日期: 2024年7月23日

Last week’s CrowdStrike outages were hard to miss, even if you were living on a proverbial desert island. These incidents highlight the inherent risks involved in upgrading software, a challenge that's becoming increasingly complex with the rise of cloud services.

The CrowdStrike scenario underscores the critical need for robust service maintenance. A typical software stack, which includes all the components necessary to deliver services to users and clients, comprises various layers. Here, we focus on management tools. These tools, often installed as drivers, sit atop operating systems and have access to the most sensitive parts of the system and hardware.

The notorious Blue Screen of Death (BSOD) is often caused by faulty drivers because they have privileged access to critical server components. While most applications are shielded from causing such catastrophic failures, security-related software is an exception. These components are essential for protecting services from threats like malware, viruses, and hackers. In this case, it was an antivirus/endpoint protection application that led to the issue.

Security applications are updated in two distinct ways:

1. Real-Time Updates: These include new virus signatures and heuristics or rule sets that detect suspicious behaviour. These updates occur frequently, often more than once a day, reflecting the relentless efforts of cybercriminals to compromise data and disrupt services.

领英推荐

Threat Actors Exploiting Ivanti Cloud Gateway…

The Cyber Security Hub? 6 个月前

Unveiling the Latest in Certificate Lifecycle…

eMudhra 5 个月前

Securing Diverse Environments: Security Configuration…

Tripwire 7 个月前

2. Periodic Updates: These involve adding new capabilities to the software and occur less frequently. They often require machine reboots, contributing to the risk.

In the recent outage, the update process failed, deploying a defective update that crashed machines upon reboot. While such risks have always existed, good vendor engineering practices typically make these events rare. However, this incident raises questions about whether the patching processes of many affected organisations were sufficiently robust.

While perfect security and resilience are unattainable, practical steps can minimise the risk of large-scale outages:

1. Staggered Updates: Security product updates should first be tested in a dev or test environment. If issues like blue screens occur, the update must be halted to ensure production stability. It is possible to build staggered updates, it just takes more planning and testing.

2. Effective Restore Mechanisms: Backup discussions often overlook their sole purpose of enabling the restoration of a server or service. Ensure your restore mechanisms work flawlessly by regularly testing your ability to revert to previous backups or snapshots. This is your ultimate safety net.

These practices apply to in-house as well as cloud hosting, but fundamentally, success hinges on adhering to basic, sound practices.

---

要查看或添加评论，请登录

Steve Butler的更多文章

How AI (Including Copilot) Can Slash Your Cloud Costs

2025年3月3日

How AI (Including Copilot) Can Slash Your Cloud Costs

Introduction: Why Cloud Cost Management Matters Cloud computing is great; it offers flexibility, scalability, and…
Is There Really A Cloud Exit Tax? Here Is My View on AWS and Azure Lock In

2025年2月10日

Is There Really A Cloud Exit Tax? Here Is My View on AWS and Azure Lock In

Cloud computing has revolutionised business operations by providing scalability, flexibility, and cost-effectiveness…

3 条评论
Why Cloud Migrations Fail—and How Smart CTOs Stay Ahead

2025年1月15日

Why Cloud Migrations Fail—and How Smart CTOs Stay Ahead

Avoid These 5 Common Cloud Migration Mistakes There's much to get excited about in making that move to the cloud for…

5 条评论
?? Nvidia's Project DIGITS: Revolutionises AI Training with Desktop Supercomputing ??

2025年1月8日

?? Nvidia's Project DIGITS: Revolutionises AI Training with Desktop Supercomputing ??

In a totally left-field development today, Nvidia has announced it's latest innovation, Project DIGITS. This is set to…

2 条评论
The Revolution Has Begun: Real-World AI Assistants Are Here

2024年12月17日

The Revolution Has Begun: Real-World AI Assistants Are Here

Google Gemini 2 and ChatGPT’s are launching Game-Changing Abilities ?? What if your AI assistant could actually do…
Incoming Crisis: Your Emails Are About to Hit a Wall! Will Your Clients Still Get Your Messages?

2024年10月21日

Incoming Crisis: Your Emails Are About to Hit a Wall! Will Your Clients Still Get Your Messages?

?? Getting your emails out to your mailing list has always had its challenges. Will they get sent to spam and missed?…

8 条评论
Agile Migration: A Perspective For CTOs

2024年9月12日

Agile Migration: A Perspective For CTOs

Introduction In today's fast-paced IT landscape, the pressure to deliver new features and products quickly is…

6 条评论
Avoiding SAAS Bottlenecks: Strategies for Seamless Performance

2024年8月22日

Avoiding SAAS Bottlenecks: Strategies for Seamless Performance

SAAS is a great software model, but it needs to be carefully designed to scale. I have seen a number of SAAS products…
How To Craft a Robust AI Policy for Your Organisation

2024年8月15日

How To Craft a Robust AI Policy for Your Organisation

A friend of mine recently went to a business conference where the presenter asked who had an AI policy in place. The…
Navigating the Cloud Migration Landscape: Lessons Learned from Migrating Applications, Rationalisation, and Decommissioning

2024年1月2日

Navigating the Cloud Migration Landscape: Lessons Learned from Migrating Applications, Rationalisation, and Decommissioning

Cloud migration is a great opportunity to improve your IT services, however, there is a lot of complexity which needs…

See all articles

Managing Risks in Software Upgrades: Lessons from the CrowdStrike Outage

Steve Butler

?? Technical Programme Manager | IT Transformation Director | Cloud Migration | Data Centre | Application Modernisation | Cost Optimisation - Delivered ￡30M in Savings | Author | Speaker

领英推荐

Steve Butler的更多文章

社区洞察

其他会员也浏览了

Active Directory

Strategic IT Planning for Small and Medium Enterprises: A Roadmap to Digital Success

How Are Legacy Systems Contributing to Cybersecurity Risks?

Why Westech 365 Backup is Essential for Your Microsoft 365 Environment

The Role of Container Security in Maintaining FedRAMP Compliance for Cloud Services

TAG Welcomes Aiden Technologies, a Hyperautomation Solution for Windows Endpoint & Vulnerability Management, to its Exclusive Exchange Platform

Benefits of Regular Cloud Security Assessments for Your Business!

Are You Prepared for the Unexpected? Discover How Acronis Backup Solutions Can Safeguard Your Data in 2024!

What is Included in Managed Services?

The Best Practices for Cloud Security and Compliance

领英推荐

Steve Butler的更多文章

How AI (Including Copilot) Can Slash Your Cloud Costs

Is There Really A Cloud Exit Tax? Here Is My View on AWS and Azure Lock In

Why Cloud Migrations Fail—and How Smart CTOs Stay Ahead

?? Nvidia's Project DIGITS: Revolutionises AI Training with Desktop Supercomputing ??

The Revolution Has Begun: Real-World AI Assistants Are Here

Incoming Crisis: Your Emails Are About to Hit a Wall! Will Your Clients Still Get Your Messages?

Agile Migration: A Perspective For CTOs

Avoiding SAAS Bottlenecks: Strategies for Seamless Performance

How To Craft a Robust AI Policy for Your Organisation

Navigating the Cloud Migration Landscape: Lessons Learned from Migrating Applications, Rationalisation, and Decommissioning

社区洞察

其他会员也浏览了

Active Directory

Strategic IT Planning for Small and Medium Enterprises: A Roadmap to Digital Success

How Are Legacy Systems Contributing to Cybersecurity Risks?

Why Westech 365 Backup is Essential for Your Microsoft 365 Environment

The Role of Container Security in Maintaining FedRAMP Compliance for Cloud Services

TAG Welcomes Aiden Technologies, a Hyperautomation Solution for Windows Endpoint & Vulnerability Management, to its Exclusive Exchange Platform

Benefits of Regular Cloud Security Assessments for Your Business!

Are You Prepared for the Unexpected? Discover How Acronis Backup Solutions Can Safeguard Your Data in 2024!

What is Included in Managed Services?

The Best Practices for Cloud Security and Compliance