Solving Meta's top 4 outage causes: 1/4 Unexpected Dependencies
Dylan Ratcliffe
Founder @ Overmind | Building the only automated pre-mortem platform
In 2022 I watched Francois Richard deliver an excellent talk at SRECon EMEA about how Meta drained every backbone router simultaneously (you remember, it was *that* outage). Here were their top 4 trending root causes of outages:
At Overmind we've had our heads down building and getting feedback over the past few months and I think we're pretty close to solving one of the four: Unexpected Dependencies
Here's an update of what we've built since I last posted an update.
How? Blast Radius.
Use our?GitHub action (or do it manually)?to go from?Terraform Plan --> Blast Radius. The blast radius is based on your live AWS state, not Terraform, which lets you see what might break:
What's next? (2/4) Configuration Updates
We're not stopping with just blast radius. Once you've decided to apply your changes, track them with Overmind. Since we've already worked out all the dependencies, we can tell you if your changes has broken something downstream, even if you didn't know it existed.
Want to try it?
Sign up and start calculating blast radius now! We’d love any feedback. Either by email, discord, or?book a meeting.
Note: This is beta software & we’d love any feedback. Either by Discord, or?book a meeting. It's free for individuals, if you're interested in a team plan contact me.
Social Media, Digital Marketing, Lead Generation
1 年I will grow your instagram and increase your follower engagement https://www.fiverr.com/s/PvDbL0
Co-founder & CEO of Alfa AI
1 年awesome Dylan Ratcliffe
Head of EMEA GTM Strategy and Architecture - Amazon Web Service
1 年I’m interested to understand more. Dylan Ratcliffe James Lane I will contact you. x
Growth Engineer @ Overmind | Building the only automated pre-mortem platform
1 年Been a busy but exciting few weeks!