Solving Meta's top 4 outage causes: 1/4 Unexpected Dependencies

Solving Meta's top 4 outage causes: 1/4 Unexpected Dependencies

In 2022 I watched Francois Richard deliver an excellent talk at SRECon EMEA about how Meta drained every backbone router simultaneously (you remember, it was *that* outage). Here were their top 4 trending root causes of outages:

No alt text provided for this image
Top 4 trending outages causes

At Overmind we've had our heads down building and getting feedback over the past few months and I think we're pretty close to solving one of the four: Unexpected Dependencies

Here's an update of what we've built since I last posted an update.

How? Blast Radius.

Use our?GitHub action (or do it manually)?to go from?Terraform Plan --> Blast Radius. The blast radius is based on your live AWS state, not Terraform, which lets you see what might break:

  • Includes?resources not managed by Terraform
  • Discovers dependencies?even if they were created manually
  • Shows?live data, not out-of-date CMDB data
  • Does all of this with read-only access, no agents, no telemetry, and no input from you. If you had to tell us how your apps are architected, we're hardly going to find unexpected dependencies are we?

No alt text provided for this image
Calculating a blast radius

What's next? (2/4) Configuration Updates

We're not stopping with just blast radius. Once you've decided to apply your changes, track them with Overmind. Since we've already worked out all the dependencies, we can tell you if your changes has broken something downstream, even if you didn't know it existed.

No alt text provided for this image

Want to try it?

Sign up and start calculating blast radius now! We’d love any feedback. Either by email, discord, or?book a meeting.

Note: This is beta software & we’d love any feedback. Either by Discord, or?book a meeting. It's free for individuals, if you're interested in a team plan contact me.

MD. SABBIR HOSSEN

Social Media, Digital Marketing, Lead Generation

1 年

I will grow your instagram and increase your follower engagement https://www.fiverr.com/s/PvDbL0

回复
Alfie Whattam

Co-founder & CEO of Alfa AI

1 年

awesome Dylan Ratcliffe

回复
Martyn Storey

Head of EMEA GTM Strategy and Architecture - Amazon Web Service

1 年

I’m interested to understand more. Dylan Ratcliffe James Lane I will contact you. x

James Lane

Growth Engineer @ Overmind | Building the only automated pre-mortem platform

1 年

Been a busy but exciting few weeks!

回复

要查看或添加评论,请登录

Dylan Ratcliffe的更多文章

  • The State of Terraform (mini) Report

    The State of Terraform (mini) Report

    Hey everyone, as part of building Overmind we've been doing a lot of research around how people use Terraform. We don't…

    2 条评论
  • Datadog Outage: Multi cloud != reliability

    Datadog Outage: Multi cloud != reliability

    Context: I'm a Datadog customer and fan of their work, and while they were down I figured I'd put Overmind to work…

    2 条评论
  • The Overmind Story

    The Overmind Story

    A few years ago when I was consulting in London, we’d just finished implementing some automation and were planning to…

    17 条评论

社区洞察

其他会员也浏览了