Continuous Monitoring #1 How are we doing?

Continuous Monitoring #1 How are we doing?

Continuous monitoring is not often a real thing. Let's start with How are we doing question:

- Fine.
- How?
- Dunno…

To avoid these awkward answers, just present some data.

- How?
- This is how…

Just for starters, let's focus on tracking the following KPIs:

  • Lead Time
  • Deployment Frequency
  • Change Failure Ratio
  • Defect Detect Ratio (or Defect Escape Ratio)

Establishing what's above, can lead you to get a more & better view, as it's always a matter of working culture. So, continuous monitoring step one.

Lead Time

The lead time KPI metric is mainly on the team's velocity. Average delivery time on user stories, features, bugs resolution, and tasks is clear information on how we are doing:

  • Are our velocity fit our Sprint length?
  • Is our backlog composition and planning focused on fast and safe delivery or is blurring our view on the delivery?
  • Finally - why do we always spill over our stories to the next Sprint?

For example:

No alt text provided for this image
How are we doing?

  • 2 weeks Sprints, with an average of 33 days per User Story? What went wrong?

Questions set, try to find the answer.

Discuss during Sprint Retro what went wrong, try to identify the cause, and during the next Sprint Planning try to set more granular (thus more achievable in 2 weeks) User Stories. And measure it again in the next period, and adapt accordingly to get it below 10 days.?

Of course, you can measure lead time on every work items you want: Epics, Features, User Stories, Tasks and Bugs.

Sounds simple? It may, but as everything is connected, remember to check other KPIs.

Deployment Frequency

No alt text provided for this image
Unregular frequency, but still can be measured

Deployment frequency is a tricky part - it depends on the team's repository structure and the maturity of the deployment process: to what extent delivery process is automated? And what quality mechanics are in place? CI/CD is a good start.

Change Failure Rate & Defect Detection Rate

The other two metrics can reveal a true delivery speed and its quality:

  • Change Failure Rate - how good are we?
  • Defect Detection Rate (or Defect Escape Rate) - how good is our quality check process?

When teams start tracking Change Failure Rate and Defect Detection/Escape Rates in their KPIs, it can lead to a change in their Software Delivery Life Cycle (SDLC) by:

  • increasing the visibility of the team's Quality Assurance in code and solution quality, deployments quality, PROD environment stability
  • applying better testing techniques
  • applying better monitoring
  • applying automation to remove manual tasks

No alt text provided for this image
Not a good state, but at least measured. Then we can change!

High-performing teams have change failure rates in the 0-15 percent range. How does it look in our case?

Measurement month by month will give feedback to the team about their % rate, and will enable questions:

  • What to do to decrease our defect rate?
  • How to ensure better quality in the deployments?
  • Or maybe we're going too fast? (when Lead Times correlate with Defect/Change Failure Ratios)

?The same goes for Change Detection Rate (have to be higher than 0) or Change Escape Rate (in this case should be lower than 100%). This KPIs is simple:

  • Measure errors on system integration test environments (SIT, first one after DEV environment, which we expect a lot of dynamics in quality ;)
  • Then measure on User Acceptance Tests (UAT) or pre-PROD, or Production Release Candidate - whatever you call it, it's a signoff environment for all your changes.
  • And finally on the Production environment.

No alt text provided for this image
Moreless, rate trends should be like above

  • More errors you can catch means the more controlled environment
  • Shift Left approach - fix earlier, measure everywhere
  • Fewer errors (in total), means the more stable environment

Ideal view: you can catch every abnormal system behavior, and your production is errorless

Summary

The same practices that enable shorter lead times — test automation, trunk-based development, and working in small batches — correlate with a reduction in change failure rates.?

?All these practices make defects much easier to identify and remediate.

By tracking the defect detection rate, teams can identify areas where they need to improve their testing processes.?This can lead to more effective testing and fewer defects in production.

By tracking the change failure rate, teams can identify areas where they need to improve their deployment processes.?This can lead to more stable deployments and fewer incidents.

In the end - being more aware, and more efficient means a lot more time for other activities or at least real free time after work ;)

要查看或添加评论,请登录

Kosma Fu?awka的更多文章

  • Continuous Monitoring #2 - What is under the hood?

    Continuous Monitoring #2 - What is under the hood?

    This series is about to bring closer technicalities around monitoring to people, who have no more profound knowledge…

  • Po co monitoring albo jak nie o?lepi? jednookiego króla

    Po co monitoring albo jak nie o?lepi? jednookiego króla

    Po co monitoring? By wiedzie?, czy system dzia?a dobrze. A jak dobrze? A na to odpowie monitoring.

    5 条评论
  • Jak stworzy? kultur? DevOps?

    Jak stworzy? kultur? DevOps?

    Wspólna praca biznes ownera, developerów i operatorów w jednym zespole to podstawa prawdziwej kultury DevOps. Przy…

  • We employ the best. But why?

    We employ the best. But why?

    We could say that everyone employs the best. Because why not? And looking the other way - is there any company that…

  • Who is an Expert?

    Who is an Expert?

    Niels Bohr, a Danish physicist from the beginning of the 20th century, said: An expert is a man who made all possible…

    3 条评论

社区洞察

其他会员也浏览了