登录查看更多内容

The DevOps Digest: 2022-03-18

Scott Prugh

发布日期: 2022年3月18日

This week, we cover Inclusion and Working Together, Mood Booster Visuals, Learning from Outages, /dev/null, Breaking Changes and COVID-19 and Wastewater

Enjoy!

?Quote: Inclusion and Working Together

“We need to understand that if we all work on inclusion together, it’s going to be faster, broader, better, and more thorough than anything we can do on our own.”

Ellen Pao

15 Quotes From Women in Tech That Will Inspire You | by KaylaMatthews | Code Like A Girl

Tweet: Mood Booster Visuals

álex - Visual illustrator ???? on Twitter: "10 mood booster visuals. 1. It's all a matter of perspective. https://t.co/d6QeQZs3XV" / Twitter

Technical Article/Presentation: How We Turned Our Company’s Worst Outage into a Powerful Learning Opportunity (London 2020)

How We Turned Our Company’s Worst Outage into a Powerful Learning Opportunity - CSG | Devops Enterprise Summit London 2020 (itrevolution.com)

This was a great presentation by CSG's Erica Morrison about how we took one of our worst incidents ever and used it to get better.

LinkedIn: Note that this video is only available by subscribing to the DevOps Enterprise Summit Video Library. A free membership(10 videos/month) is available as well as individual and corporate memberships.

FYI: IT Revolution announced 2022 Conference Dates. I'm happy to say that the flagship event will be back in Las Vegas this year and in person! Additionally, registration and CFPs for the May Europe Event are now open!?

2022 Conference Dates

DevOps Enterprise Summit Virtual - Europe

10-12 May 2022?|??Registration Open?|??CFP Open

DevOps Enterprise Summit Virtual - US

August 2-4, 2022

DevOps Enterprise Summit?US Flagship Event?

领英推荐

Learning About DevOps

Venkatesh C. 1 年前

March 2023 Enterprise Kubernetes Month at DevOps…

DEVOPS INSTITUTE 1 年前

DevOpsDays Cairo 2024 Conference: The Distinguished…

Dr. Ahmed S. ELSHEIKH - EDBAs, MBA/MSc 2 个月前

The Cosmopolitan of?Las?Vegas

October 18-20, 2022

Podcast: /dev/null

I'm still catching up from last week's offsite and my podcast listening was on hold.

Books: Kill It with Fire / 9: BREAKING CHANGES

We build our computer systems the way we build our cities: over time, without a plan, on top of ruins. —Ellen Ullman

Amazon.com: Kill It with Fire: Manage Aging Computer Systems (and Future Proof Modern Ones) (Audible Audio Edition): Marianne Bellotti, Katie Koster, Random House Audio: Books

In this chapter, Marianne discusses design Breaking Changes and selling changes while being honest about the risks.?I find this topic very pertinent and one I'm passionate about.?For me, I always feel more comfortable "running towards the risk/problem" vs. waiting for "the problem to run over you."?I also look at these problems as opportunities to make dramatic improvements and release untapped potential.?I often quote my dear friend Mauricio Zamora, saying: "You can't possibly make it worse, right?"

In this chapter, Marianne hits on the following:

Inertia is real and prevents organizations from moving forward.
It is impossible to improve legacy systems without breaking them.
"Air cover" from leaders and creating psychological safety is critical, but to be successful, you need to alter the organization's perception of risk.
Understanding "how people get seen" and behaviors that get noticed.
Positive re-enforcement in the form of social recognition tends to be a more effective motivator than traditional methods (bonuses, rewards, promotions).
Creating incremental social rewards that show progress can be a great motivator.?Use incremental "kudos" to recognize small wins.
Celebrating failures is a great way to build just cultures. Blameless postmortems is a good place to start.
The closer you can push accountability to the people maintaining systems, the greater the resilience.?Allow operators to exercise discretion to modify procedures.
"The highest probability of success comes from having as many people engaged and empowered to execute as possible."
Breaking something proactively is generally uncomfortable, but should be embraced in modernization and other operational contexts (my emphasis).
Systems that are too reliable can be taken for granted and fail to rack up "observations of resilience."
Perfectly running systems create false senses of security that lead to lack of continued improvement.
"Occasional system problems that are resolved quickly —can actually boost the user’s trust and confidence. The technical term for this effect is the service recovery paradox."
Being fast, professional and transparent about outages and the resolution improves relationships with stakeholders.
Having a system no one understands is a weakness, and breaking a system to understand behavior is a powerful mechanism to learn and build resilience.
Waiting for something to fail is applying "hope" but planning for a timed failure allows you to bring the right resources, planning and timing to a failure.
To investigate failures, look to system logs.?If there aren't any, look to add telemetry to understand behavior.
For planned failures, look to have a quick rollback or "kill switch" to revert to previous state.?Communicate and level set this plan with stakeholders.

I thought this was a great chapter in bringing forward some key ideas around DevOps -- specifically ideas around Psychological Safety, Resilience Engineering, Failures as Opportunities as well as Planning and Practicing Failures.?These ideas are not only useful and powerful for modernization, but also for improving software systems and the socio-technical environments that surround them.?At CSG, we implemented several of these techniques through:

Incident Swarming, Team Incident Retros (Local Learning) and Group Retros (Global Learning).?Swarming brings the right knowledge and expertise to the problem as quickly as possible (run towards the problem).?Retros at multiple levels change the culture of how we view failure and builds both learning and resilience.
Implementing the Incident Management System(IMS). Post a large failure in 2019 we dug in, embraced the failure and came out stronger.?See Erica's great video above: How We Turned Our Company’s Worst Outage into a Powerful Learning Opportunity (London 2020). We also wrote a paper about improving Incident Response:?A Framework for Incident Response (itrevolution.com)

Feature/kill-switches and planned "outages".?We learned several years ago that "big batch migrations/modernizations" were dangerous and started approaching many software and operational activities as incremental approaches that were likely to fail in some sort of way.?During many of our ports, we planned and communicated switchovers during the day when folks were fresh and we could monitor as well as rollback quickly.?Given the complexity of our systems and integrations, it was not possible to design or code away all edge cases.?We needed to have safe ways to fail quickly, roll back, fix and do it again.?This practice built great system understanding, resilience and credibility with stakeholders.??

Something Else: COVID-19 levels detected in Illinois Wastewater Plants

https://www.axios.com/newsletters/axios-chicago-e6b1b1e8-7529-40ca-9b38-d969539997c1.html

This week's Axios Local highlighted potential trouble ahead.?COVID-19 levels in wastewater have dramatically increased (1000%)… Yikes.?I'm wary to go towards panic from these numbers as there could be other things at play like getting better at testing, immunity, etc.?But, this trend will be important and interesting to watch.

?Also, see the US tracker here: CDC COVID Data Tracker: SARS-CoV-2 RNA Levels in Wastewater in the United States

álex Maese Juárez

Director General en Sensation Apartments ??

2 年

Awesome to be here! Thanks for this Scott.

查看更多评论

要查看或添加评论，请登录

查看全部

The DevOps Digest: 2022-03-18

Scott Prugh

?Quote: Inclusion and Working Together

Tweet: Mood Booster Visuals

Technical Article/Presentation: How We Turned Our Company’s Worst Outage into a Powerful Learning Opportunity (London 2020)

领英推荐

Podcast: /dev/null

Books: Kill It with Fire / 9: BREAKING CHANGES

Something Else: COVID-19 levels detected in Illinois Wastewater Plants

更多精彩文章

社区洞察

其他会员也浏览了

Deming to Devops: The Science Behind Devops

From Kubernetes to Generative AI: The Future of Work

?? DevOps Trends (2023) ??

DevOps Trends For 2020: A Complete Guide

8 DevOps Trends to Be Aware of in 2019

Watch out! Sharks at KubeCon

DevOps mastery with the Borg

Unlock Your Future! Master DevOps and Secure Your Place in the Tech Revolution

Rise of the Platform Engineers: Taking DevOps to New Heights and Keeping Developers Happy

#13: Is platform engineering key ?? to the DevOps dream?

?Quote: Inclusion and Working Together

Tweet: Mood Booster Visuals

Technical Article/Presentation: How We Turned Our Company’s Worst Outage into a Powerful Learning Opportunity (London 2020)

领英推荐

Podcast: /dev/null

Books: Kill It with Fire / 9: BREAKING CHANGES

Something Else: COVID-19 levels detected in Illinois Wastewater Plants

2023 State of DevOps Report: Infographic and Balanced Teams

2023年10月13日

Advancing your career from technician to problem solver

2023年10月12日

DOES 2023 Las Vegas: Nicole Forsgren: DevEx Essentials: Igniting Results

2023年10月10日

Celebrating Veterans and Momentum

2022年11月11日

DevOps Digest: 2022-10-20 / DOES 2022 Vegas Part 0

2022年10月22日

DevOps Digest: 2022-09-16 / Reboot Edition

2022年9月15日

DevOps Digest: 2022-09-03 / US Open Edition

2022年9月3日

DevOps Digest: 2022-08-26

2022年8月26日

DevOps Digest: 2022-08-11 / College Drop Off Edition

2022年8月12日

The DevOps Digest: 2022-07-29

2022年7月30日

社区洞察

其他会员也浏览了

Deming to Devops: The Science Behind Devops

From Kubernetes to Generative AI: The Future of Work

?? DevOps Trends (2023) ??

DevOps Trends For 2020: A Complete Guide

8 DevOps Trends to Be Aware of in 2019

Watch out! Sharks at KubeCon

DevOps mastery with the Borg

Unlock Your Future! Master DevOps and Secure Your Place in the Tech Revolution

Rise of the Platform Engineers: Taking DevOps to New Heights and Keeping Developers Happy

#13: Is platform engineering key ?? to the DevOps dream?