My First-Hand Account of the CrowdStrike Outage:

My First-Hand Account of the CrowdStrike Outage:

As an IT (Information Technology) professional, I recently found myself amid a significant event: the CrowdStrike software outage caused by their release of an update, which led to a global Microsoft outage.?According to? Reuters initial reports,?Experts speculate that the update was not subjected to routine patch management procedures, such as testing in a sandbox, to ensure no issues arose.?

The incident, described by CNN as the largest IT outage in history, is expected to cost Fortune 500 companies over $5 billion in direct losses, according to an insurer's analysis published on Wednesday. Some sources suggest that global financial damages could be around $10 billion. At the time of the incident, CrowdStrike reported having more than 24,000 customers, including 60% of Fortune 500 companies and over half of the Fortune 1000. (“2024 CrowdStrike incident - Wikipedia”) On July 20, Microsoft estimated that 8.5 million devices were affected by the update. Despite being an unforeseen situation, CrowdStrike's and Microsoft responsiveness and ability to assist their clients was impressive and worth commending! That said, there is a lesson to be learned for all IT professionals from this experience.

?

My Experience?

Due to weather delays and flight cancelations, I ended up caught in the chaos of the CrowdStrike/Microsoft update which created a global systems outage. While traveling from North Carolina to Rochester, NY, after being rerouted through Atlanta due to weather, my flight from ATL to ROC was delayed until 12:30 a.m. After an hour on the tarmac from NC, I was grateful for access to the ATL Delta Club lounge, thanks to my AWS (Amazon Web Services) Partner Sales Manager, Lindsey Crowe. Around 1:30 a.m., boarding for my ATL to ROC flight began. Remarkably, we were seated in 18 minutes, earning applause from the crew. However, ten minutes later we were told to deboard because the systems were down, preventing takeoff. We were told the flight would take off in the morning at 9:00 a.m. At about 3:00 a.m. I woke up after a quick nap in the airport chair, reviewing the news, it became apparent that this was a global systems outage triggered by a CrowdStrike update, with some passengers recalling seeing the infamous blue screen of death.?At this point, the uncertainty started to creep in and – will I ever get out of this airport? ?

?

I spent Thursday night and Friday morning at the airport, with a 9 a.m. flight repeatedly delayed and eventually canceled. I was ticketed on an 11:30 flight. At one point the flight had a plane, pilot but no crew, so the flight was canceled due to poor personnel management. Most of my fellow passengers didn't fly out until Sunday or Monday. I had the good fortune of meeting two individuals, Mike, and Justin, who were also traveling to Rochester. We quickly struck up a friendship. So, Friday afternoon, we took an Uber to Knoxville, TN, where we met up with a friend who had seen my LinkedIn post. We then drove back, arriving on Saturday morning after an all-night drive. It was quite an adventure, e.g., like the 1987 classic, Trains, Planes, and Automobiles (we did not take any trains).??

?

Reflections on the Outage?

Events like this Crowdstrike outage test the resilience of a company's technology and procedures. This outage underscored major companies' reliance on SaaS and cloud-based infrastructure, highlighting the critical need for robust disaster recovery plans and competent management strategies. From my experience, not all airlines faced the same level of disruption. Some were operational by 8:30 a.m., with sufficient personnel to handle the crisis, while others struggled either due to systems outages or a lack of business continuity policies.??

?

This incident disrupted daily life and businesses in every sector: airlines, hotels, car rental companies, retail, hospitals, and government agencies. 8.5 million systems crashed and were unable to restart correctly.?The lack of adequate disaster recovery plans and insufficient personnel exacerbated the situation for many within these verticals.??

?

How ePlus Managed the Crisis?

When the outage hit, our clients reached out at 5 a.m. After confirming it was a global event, our team sprang into action. We split into two work streams: one focused on troubleshooting and fixing the issue; by 8 a.m., core production services were back online, secondary services were making progress, and we had verified that all active customers were minimally affected.?

?

Client Feedback?

Our clients appreciated our swift response. One client shared: "I'm receiving texts from friends at other businesses just discovering the problem. Our stores will open this morning, and most things are working fine. That's why you need a strong IT team."?

?

Conclusion?

This experience has reinforced the importance of preparedness and having a strong IT team to manage unforeseen disruptions. At ePlus, we remain committed to providing top-notch Managed Services solutions to ensure our clients can navigate any challenge.?

Andrew J. Federico, Jr

Senior Applications Architect at Harris Beach PLLC

2 个月

Sorry hear you got caught at the airport during this. I completely agree that this situation revealed the true nature of many corporations preparedness, or lack of it. Disaster recovery is far too often ignored, and even if a well-designed recovery model is off the table, some form of remediation plan and deep analysis of current infrastructure needs to take place. I was amazed how many corporations were sneaker-netting their recovery efforts; having technicians run system to system with thumb drive recovery media. In a large community of like-minded professionals, I encouraged many to simply script it if they were fortunate enough to be using PXE boot. A simple system reboot would pull the bitlocker key, decrypt, mount a winpe image, auto-delete the corrupt sys file, and reboot, saving a significant amount of man hours. Hopefully more corps will be taking precautions more seriously going forward, as I believe this was a wake-up call for many, and certainly not the last of this significance to be seen.

Bill Kerr

Founder, Tickers

3 个月

Wow, what an adventure that turned into! That road trip is one of those memorable experiences you get out of an unpleasant situation. I remember renting a car to drive back to Florida after my airplane's engine cover opened up and the plane had to immediately land after a re:Invent. The road trip turned out to be one of those "cherry on top" treats.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了