Why you will fail without Chaos Engineering, with Kolton Andrus - HS#25

Why you will fail without Chaos Engineering, with Kolton Andrus - HS#25

Sign up for the weekly newsletter: https://hockeystick.show/welcome

Introduction

Welcome to episode 25 of the HockeyStick podcast, where we delve into breakthroughs in tech, business, and performance.

In today's episode, Miko Pawlikowski ??? sits down with Kolton Andrus , a well-known figure in the SRE and chaos engineering space. As the founder of Gremlin and a seasoned engineer, Kolton shares his insights into the evolution of chaos engineering, the challenges it faces, and his thoughts on the future of the industry.

The Journey of Chaos Engineering

Kolton Andrus begins by discussing the foundational ideas of chaos engineering. "It's about taming the chaos," he explains. The primary goal is to find system edges and handle them efficiently, ensuring reliability. Kolton emphasizes that organizations should invest in reliability as it is often a multimillion-dollar problem.

Shifting Roles at Gremlin

Kolton moved from being the CEO to the CTO of Gremlin. "It's been a journey," he reflects, noting that he felt his talents were best served in a technical role. This shift allowed him to work on product development and address the problems within chaos engineering more thoroughly.

The Importance of Chaos Engineering

Chaos engineering is an emotional topic for many SREs, like Miko Pawlikowski. It deals with intentionally injecting failures to test system resilience. Kolton highlights that the engineering part is crucial, "because whenever you tell someone I do chaos engineering, they think you're the joker… And that's the mistake."

The Branding Dilemma

While the concept and technique of chaos engineering are sound, its branding remains a challenge. The term "chaos" doesn't sit well with corporate executives. Kolton shares that although they leaned into the fun branding with Gremlin, it sometimes backfired. Executives want maturity and reliability, not something perceived as "immature."

Marketing and Acceptance

Marketing has always played a significant role in the adoption of chaos engineering. Many organizations found the name off-putting. Kolton notes that reliability engineering or resilience engineering might be better terms. The focus is on explaining to the stakeholders the benefits and necessity of adopting such practices.

Gamification in Engineering

One of the challenges in chaos engineering is getting organizations to adopt it systematically. Kolton mentions creating a rubric and scoring system for services, helping teams see their progress. "If you want people to do the right thing, you need to make it easy," he asserts.

The Evolving Landscape

Kolton acknowledges that the gaming industry, despite its need for reliable systems, often lags in adopting such practices. He points out that people are generally resistant to changes, especially when they seem complex or unnecessary.

Lessons Learned and Future Prospects

Over the eight years of Gremlin's journey, Kolton has faced numerous ups and downs. From being told they had product-market fit to being told they did not during the pandemic, it has been a learning experience. "It's super hard when it's your baby," Kolton admits, but the key is to keep iterating and improving.

Intelligent Health Checks

Gremlin's latest features focus on intelligent health checks, enabling even those without robust monitoring systems to understand their system's health. "How do we take the expertise that me and a lot of the engineers on my team have learned…and embed it into the product?" Kolton asks.

AI in Reliability

The conversation also touches on the role of AI in reliability engineering. Kolton is skeptical about the current AI capabilities. He believes AI can assist in guidance and analysis but cannot replace the need for deterministic solutions in complex distributed systems.

Kolton's Philosophy

Kolton's closing thoughts are reflective and grounded. He advocates for incremental improvements, "do a little better every day." This philosophy, he believes, applies not only to engineering but also to personal development.

Conclusion

Kolton Andrus's journey through chaos engineering and reliability offers valuable insights for anyone in the tech industry. His experiences underscore the importance of resilience, not just in systems but also in navigating the challenges of innovation and acceptance. Tune in to the full episode for an in-depth discussion on the future of chaos engineering and much more.

Jitender Girdhar

Ranked #1 Creator in Workplace Wellbeing, 3 TEDx Talks | Bestselling Author | Entrepreneur | Columnist | I help founders and leaders 10x their impact | Follow for No-Nonsense insights on Leadership, and Workplace Culture

1 个月

Looking forward to diving into this heart-to-heart?conversation

回复
Malay Matalia

3x Founder Helping you Build a Growth-Mindset | Speaker | Former Tax Attorney | Follow for Actionable Insights

1 个月

Great insight, Miko! Chaos Engineering is such an important but often overlooked aspect of building resilient systems. K

回复
Maggie Olson

Founder & CEO @ Nova Chief of Staff | Acclaimed Fortune 40 CoS to President | First-of-its-kind Chief of Staff Certification Course | C-Suite Leadership Speaker | Building Confidence Around the Globe ??

1 个月

Chaos Engineering really is a game-changer, and it's great to see insights from one of the best in the field.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了