The CTO Playbook
Chapter 4 - Accelerate pt. I - Institute a Metrics Driven Culture
This is an area that I believe every CTO should pour a significant amount of time and effort, as it's the most impactful to overall performance. We're going into some detail here, so buckle up.
My notes for this post start with "describe the problem...", which I'll attempt to do now, rather than just jumping into the solution .... tempting as that may be.
The problem is hugely complex systems. In a previous post I talked about multiple interacting (and interfering) complex systems comprising of people, teams, technology stacks, business domains, delivery environments, coding languages, incidents, operational processes, weather, pandemics and solar flares......
For the sake of brevity within this post (ha!), let's pull this back to your (as CTO) domain of influence - people, process, teams and technology stacks.
Even within these areas there are (literally) millions of things that the individuals who make up your department need to get right, or mostly right, most of the time, in order for everything to work.... even to a basic level.
Perhaps if you - as a highly talented individual working alone - had enough time, and built everything from scratch yourself, you might be able to build the perfect system, and put in place all the right infrastructure, code, processes and practices - but let's get back to reality.
Organisations exist for this very purpose; to amplify the ability of a leader to achieve a goal, and the likely-hood of them being successful at this depends on how well that vision is transmitted to the people within the organisation.
Unfortunately you might find that your current reality is one where your department looks more like an unruly mob, munging out another large production release across a decaying blob of legacy infrastructure that suffers from frequent service outages, and you know the next few weeks will be an onslaught of release-related issues.
Is that a bit bleak? Does it make your gut wrench because you've been there and done that? (If so, you'll get the same PTSD-like onslaught of emotions upon reading "The Phoenix Project" written in part by Gene Kim, and highly recommended).
Here's how to fix that.
Those five words, "here's how to fix that" perhaps make it sound like the fix might be easy or simple (and they're very different: just ask somebody who's been given the simple task of giving up cigarettes).
Unfortunately no; this is not a simple or easy fix, but it is a fix - and it works.
The solution, more succinctly, is a joint vision, regularly discussed, comprised of a set of principles that continually guides us in the right direction.
We're going to communicate that vision using a set of metrics. Attention and focus on the right metrics are a hugely powerful method to drive the outcomes we're looking for.
The following is based largely on a book with the long name of "Accelerate: The Science of Lean Software and Devops: Building and Scaling High Preforming Technology Organizations" by Nicole Forsgren PhD, Jez Humble and Gene Kim.
The authors are luminaries in the DevOps world - actually if memory serves, one or more of them were instrumental in founding the DevOps movement, even coining the term "DevOps".
Through many years of research, they found that the consistently highest-performing technical teams shared some key attributes, which were as follows:
They tracked these metrics as:
领英推荐
That's all very well you might think - nice correlation.
But no - they found that businesses who started implementing these metrics (as in, tracking them and making them visible within the organisation) went on to see large improvements in overall technology performance.
Technology performance defined here, means getting more software delivered, more quickly, with higher quality and more resilience as measured against other teams.
So, it's not correlation, it's causation (even if somewhat indirect in many cases).
Buy why? How does merely tracking and increasing visibility of these metrics improve the performance of potentially an entire department?
Well, it's easy to say "implement tracking, increase visibility" - but that's not representative of the effort (and understanding) required to do this effectively.
Let's start by outlining the areas that each metric impacts:
Deployment frequency - increasing the frequency of your deployments means that effort must go into improving deployment automation, ensuring there are pipelines and processes in place for every change that makes it out to production. It's still very common to see businesses performing monthly releases when they should be moving towards multiple production deployments every day.
This also ties in closely with Cycle Time (or Lead Time, see note below), which requires making the deployments smaller, and has impacts right through to the agile/product requirement specification processes. It requires teams to keep stories small, and split larger pieces of work into separately deployable components, perhaps managed through feature switches for example.
Both of these metrics drive changes in how development teams work - how well they focus together and do code reviews and fixes - because this change is going to production in a few hours! Naturally this pulls in the product, QA and SRE/DevOps teams as well, which it should. All of these teams and individuals need to work in concert for this to be successful.
Importantly they will find that they must change their processes and possibly tooling in order to make this work - and that's EXACTLY what we want. More importantly, they must discover this for themselves, and implement the changes in this themselves - it can't be you as CTO directing every little change.
These first two are the most useful metrics that measure and monitor 'flow' through the hugely complex systems I've discussed above.
This is the flow of value to your customer. Complex systems absolutely must have consistently-sized work items flowing through in order to maintain delivery consistency - the more you work towards this the more it will become apparent that it's the large, irregular changes that clog up systems and cause significant knock-on delays and additional work effort.
(Note about Cycle Time - originally Cycle Time in the DevOps context was how much time it took to deploy code to production and was primarily concerned with pipeline automation. Expanding this to include the development process (which is a must) makes Cycle Time cover the time from beginning coding, through to the production deployment (and that's how it is defined in Atlassian's tools). Expanding this again (which is also a must in my opinion) to include the creation of the agile story, so that you're measuring the performance of the Product team is generally defined as "Lead Time" and should also be tracked alongside Cycle Time).
CFR and MTTR are next up...
... but I'm going to have to leave that to Accelerate Part 2 in order to be able to cover it at a useful level of detail - bet you didn't expect a cliffhanger! - See you next week.
Executive Director at Empire Technologies | ISO 27001 | Cyber Security Solutions & IT Support
5 个月Hi Rob, I just wanted to take a moment to thank you for the time you were my manager at NextGen. Your expertise and mentorship were invaluable, especially when it came to guiding me in leading and mentoring my own team. I’m excited to see your 'CTO Playbook' take shape, and I know others will benefit from your insights as much as I have. Looking forward to learning more from this series!"
Co-Founder and CEO at RHA Technologies | Digital Transformation Strategist | AI-Powered Solutions | IT Consultancy | Delivering MVPs at Lightning Speed | Co-Founder TheParentZ.com
5 个月Rob Hill - I've found that building a strong culture of collaboration and communication is essential for improving software delivery performance. Breaking down silos and fostering a sense of shared ownership can make a significant difference.?
Co-founder and CTO @ Morfless
5 个月Great content Rob and love your style!
Founder @ Kanda Colab ??? 10 Features. 10 days. Your Idea, Live! ?? DM "MVP" to get started!
5 个月Loving the playbook updates. Ownership and accountability are the best ways to engage a team! As engineers, we know what the risks are and we’ll do whatever we can to mitigate these given the opportunity. Great read Rob Hill ??