DevOps - Culture

A recent training I attended on DevOps is the genesis for this post. The trainer focused on throwing names of different tools in each phase of the CI/CD DevOps pipeline. A beginner might mistake DevOps as just another set of tools. Tools are essential in inculcating good habits and behavior - i.e culture, but DevOps is more than just tools. In continuation with my previous post, i want to highlight how the DevOps cultural practices differ.

Hiring

  • DevOps is not just making Dev and Ops teams to sit together and ask them to collaborate. The real collaboration happens only when they have mutual respect and when they can speak and understand the same vocabulary and language. In general, companies treat Dev as Strategic and Ops as something that can be outsourced. With this thinking, the hiring practices also do not test Ops with same level of scrutiny. DevOps requires a shift in this thinking and companies need to hire people with capabilities and intentions to automate the Ops activities. Hiring the best minds in Ops will invariably shatter the pecking order among Dev, Test and Ops. This is another invisible wall that will be broken by DevOps.
  • In this interview, Ben Treynor of Google's SRE (Site Reliability Engineering) explains how they use close to passing the SWE bar for hiring of SREs.
  • Instead of hiring specialists always, companies should be ready to hire generalists who can acquire any skill depending on the need. Practices like market-orientation of teams (instead of pure functional orientation) and 2 Pizza Rule Teams are all some way related to this.

Improvement Cycles

  • Product owners should always reserve 20% of cycles for improving and optimizing things. At least 20% of time should be allocated to pay down the technical debt
  • Processes do not remain same over time and they degrade due to entropy. Finding ways to improve daily work is more important than just doing daily work. This culture should be ingrained in every developer, test engineer and Ops so that we can continuously improve and develop a culture of innovation.
  • For teams and products with serious issues and more technical debt, 30% or even 40% can be reserved for improvement. This ensures that defects are found and fixed early when they are cheap rather than blowing up in the Production.

Quality, Operations and Security is everyone's job

  • User Stories are not 'DONE' the moment Dev checks-in the code on to the trunk. Stories are DONE only when they are running successfully in the production without breaking any other functionality and customer realizes the value. To achieve this, everyone in the value stream is equally responsible. Quality and security are part of everyone's daily job - not something that's taken care only at the end of release.

Empathy and Optimization for downstream teams

  • In addition to the customer (external) requirements, everyone should focus on optimizing work for their immediate downstream team members (i.e internal customers).
  • When the design team does their job, they should take into account how easy is to Test, Deploy and Operate the solution. When the development team does their coding, they should take into account, how easy is it to automate testing, how easy is to configure/toggle features during run-time. In this way, when work is optimized for downstream teams, rework will be avoided and non-functional requirements like testability, deployability, security, operability and maintainability are taken care always and by all streams.
  • Metrics like Percent Complete and Accurate (%C/A) can be obtained by asking downstream customers what percentage of time they receive work that is usable without requiring any corrections and clarifications.
  • If the upstream teams are not optimizing for downstream teams, like in the case of Google SRE in above interview, Ops team members should be allowed to move on from an Ops Disaster project. When the Ops team falls below a certain size, Dev should be tasked with managing Ops as well, thus there will incentives for Dev team to design and develop something that is smooth for Ops as well. Shared Pain as a way to inculcate Shared Goals.

Telemetry and Instrumentation

  • If you don't measure it, you can't manage it and you can't improve it. Metrics should be decided and coded for everything that can be influenced and improved. This requires budget for infrastructure and easier libraries so that developers can integrate metrics and monitoring into their code effortlessly.
  • Metrics should be all encompassing - business level, application level and infrastructure (DB, OS, Network)level. A business level metric might allow the Product owners to understand the feature usage, similarly application and infrastructure metrics allow problems to be seen in real time as they are occurring or building up, thus be proactive before the customer even sees the problem.
  • Deciding of metrics and usage should be taken care during activities like peer review. When troubleshooting production issues, teams should focus on what metric could have been added, to warn about this issue or show this trend/buildup of the problem.
  • Even to use automatic deployment techniques like canary deployment, we need to identify, measure and monitor the key metrics after deploying changes.
  • Metrics collected should be accessible to all the stakeholders in the value stream. Information radiators can be made visible to all. This increases trust among teams and with customer as well.

Swarm and Solve problems

  • Whenever a problem occurs, it should be swarmed and solved immediately. This prevents putting things on back burner and starting new tasks, thus reducing work in progress. Also by solving the problem immediately, there are no fading memories and no context is lost. Also, there's no increase of technical debt, and defects are solved earlier when they are cheaper.
  • Developers working always on trunk or check-in at least once daily to the trunk instead of long-lived feature branches is related to this practice only - i.e team's productivity is prioritized over individual productivity. Borrowing from Lean practice and the Toyota way, an Andon Cord can be agreed to and used when work gets stuck.

Humane work conditions and happier work force

  • Though this sounds like an altruist objective, whole of Devops revolves around this simple point. Build safe systems that can be deployed routinely and frequently instead of a weekend or a graveyard shift maintenance window, reduce rework by optimizing for downstream teams, automate as much as possible to prevent manual, repetitive and boring work that can cause burnout and mistakes, encourage culture of innovation - all these are related to this single cultural practice only.

Reference : The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations - Gene Kim, Jez Humble, and Patrick Debois

要查看或添加评论,请登录

Phani Kiran R的更多文章

  • DevOps - Idle Time

    DevOps - Idle Time

    When we talk about DevOps, we often hear references to lead time and cycle time. In this post, I want to focus on idle…

  • Downstream Optimization - Make Work Flow Better

    Downstream Optimization - Make Work Flow Better

    It is almost year-end and festivities are around the corner. ServiceNow IDC gifted one of the best-selling books - “The…

    1 条评论
  • Evolutionary Processes Vs Revolutionary Tools

    Evolutionary Processes Vs Revolutionary Tools

    Reading this article titled, "A Fool With A Tool Is Still A Fool" inspired me to post below thoughts and observations:…

    1 条评论
  • Invest In Your User Stories

    Invest In Your User Stories

    Recently i came across an acronym/criteria that can be used to verify if Agile User Stories are ready for the…

  • WIP - Work In Progress or Waste In Progress ?

    WIP - Work In Progress or Waste In Progress ?

    Concept of limiting WIP (Work In Progress) in DevOps is inspired from the manufacturing world. From the Toyota…

    1 条评论
  • DevOps - Automation

    DevOps - Automation

    In continuation of my previous article on DevOps Culture, i would like to touch upon 'Automation', the 'A' in the CAMS…

    2 条评论
  • Evolution of 'Undifferentiated Heavy Lifting'

    Evolution of 'Undifferentiated Heavy Lifting'

    It all started with Iaas Providers doing the undifferentiated heavy lifting required for performing mundane activities…

  • DevOps & Its Contradictions

    DevOps & Its Contradictions

    CAMS Model, coined by Damon Edwards and John Willis is a good way to understand DevOps. CAMS stands for Culture…

  • Ceremonious Agile

    Ceremonious Agile

    Reading this article on 'Fake Agile' made me pen my thoughts as below: Whether you call it Agile or SRE or DevOps or…

    2 条评论
  • Ideal Service Assurance COTS ?

    Ideal Service Assurance COTS ?

    After spending more than a decade working with industry leading Service Assurance (Fault/Performance/Service…

    3 条评论

社区洞察

其他会员也浏览了