Why you will fail without Chaos Engineering, with Kolton Andrus - HS#25
Miko Pawlikowski ???
I help technical leaders achieve HockeyStick growth | Head SRE | Co-founder SREday.com, Conf42.com & 5 more
Sign up for the weekly newsletter: https://hockeystick.show/welcome
Introduction
Welcome to episode 25 of the HockeyStick podcast, where we delve into breakthroughs in tech, business, and performance.
In today's episode, Miko Pawlikowski ??? sits down with Kolton Andrus , a well-known figure in the SRE and chaos engineering space. As the founder of Gremlin and a seasoned engineer, Kolton shares his insights into the evolution of chaos engineering, the challenges it faces, and his thoughts on the future of the industry.
The Journey of Chaos Engineering
Kolton Andrus begins by discussing the foundational ideas of chaos engineering. "It's about taming the chaos," he explains. The primary goal is to find system edges and handle them efficiently, ensuring reliability. Kolton emphasizes that organizations should invest in reliability as it is often a multimillion-dollar problem.
Shifting Roles at Gremlin
Kolton moved from being the CEO to the CTO of Gremlin. "It's been a journey," he reflects, noting that he felt his talents were best served in a technical role. This shift allowed him to work on product development and address the problems within chaos engineering more thoroughly.
The Importance of Chaos Engineering
Chaos engineering is an emotional topic for many SREs, like Miko Pawlikowski. It deals with intentionally injecting failures to test system resilience. Kolton highlights that the engineering part is crucial, "because whenever you tell someone I do chaos engineering, they think you're the joker… And that's the mistake."
The Branding Dilemma
While the concept and technique of chaos engineering are sound, its branding remains a challenge. The term "chaos" doesn't sit well with corporate executives. Kolton shares that although they leaned into the fun branding with Gremlin, it sometimes backfired. Executives want maturity and reliability, not something perceived as "immature."
Marketing and Acceptance
Marketing has always played a significant role in the adoption of chaos engineering. Many organizations found the name off-putting. Kolton notes that reliability engineering or resilience engineering might be better terms. The focus is on explaining to the stakeholders the benefits and necessity of adopting such practices.
领英推荐
Gamification in Engineering
One of the challenges in chaos engineering is getting organizations to adopt it systematically. Kolton mentions creating a rubric and scoring system for services, helping teams see their progress. "If you want people to do the right thing, you need to make it easy," he asserts.
The Evolving Landscape
Kolton acknowledges that the gaming industry, despite its need for reliable systems, often lags in adopting such practices. He points out that people are generally resistant to changes, especially when they seem complex or unnecessary.
Lessons Learned and Future Prospects
Over the eight years of Gremlin's journey, Kolton has faced numerous ups and downs. From being told they had product-market fit to being told they did not during the pandemic, it has been a learning experience. "It's super hard when it's your baby," Kolton admits, but the key is to keep iterating and improving.
Intelligent Health Checks
Gremlin's latest features focus on intelligent health checks, enabling even those without robust monitoring systems to understand their system's health. "How do we take the expertise that me and a lot of the engineers on my team have learned…and embed it into the product?" Kolton asks.
AI in Reliability
The conversation also touches on the role of AI in reliability engineering. Kolton is skeptical about the current AI capabilities. He believes AI can assist in guidance and analysis but cannot replace the need for deterministic solutions in complex distributed systems.
Kolton's Philosophy
Kolton's closing thoughts are reflective and grounded. He advocates for incremental improvements, "do a little better every day." This philosophy, he believes, applies not only to engineering but also to personal development.
Conclusion
Kolton Andrus's journey through chaos engineering and reliability offers valuable insights for anyone in the tech industry. His experiences underscore the importance of resilience, not just in systems but also in navigating the challenges of innovation and acceptance. Tune in to the full episode for an in-depth discussion on the future of chaos engineering and much more.
Ranked #1 Creator in Workplace Wellbeing, 3 TEDx Talks | Bestselling Author | Entrepreneur | Columnist | I help founders and leaders 10x their impact | Follow for No-Nonsense insights on Leadership, and Workplace Culture
1 个月Looking forward to diving into this heart-to-heart?conversation
3x Founder Helping you Build a Growth-Mindset | Speaker | Former Tax Attorney | Follow for Actionable Insights
1 个月Great insight, Miko! Chaos Engineering is such an important but often overlooked aspect of building resilient systems. K
Founder & CEO @ Nova Chief of Staff | Acclaimed Fortune 40 CoS to President | First-of-its-kind Chief of Staff Certification Course | C-Suite Leadership Speaker | Building Confidence Around the Globe ??
1 个月Chaos Engineering really is a game-changer, and it's great to see insights from one of the best in the field.