Musings on Metrics
Photo by Luke Chesser on Unsplash

Musings on Metrics

Introduction

Data-driven decision making has been on the rise for many years, especially in the digital product space. From customer analytics and business metrics to tuning software performance and optimizing marketing campaigns, numbers are widely used to inform decisions on a daily basis. The idea of management by numbers is also neither surprising nor new. However, as far as I can tell, it is still far from the norm.

This is interesting. Wouldn't it be desirable to base management decisions on facts and data rather than on individual subjective perspectives that can be influenced by a variety of biases? Curiously, many engineers who would never attempt to optimize performance or user value without measuring first, are often reluctant to be measured themselves. And I honestly cannot blame them, given how often metrics still seem to be misused by management with outdated views of engineering productivity or an oversimplified idea of what it means to be data-driven. Which is particularly annoying because people like Patrick Kua have pointed out the pitfalls already a decade ago(1).

To be fair, this is not an easy problem to solve. Software development is not factory work, we can't just use activity and output as a measure of how good our teams are doing. Tracking the amount of thinking, communication, and problem solving that goes into building a product is really hard. And even if we could, it still wouldn't tell us how successful our work is.

"The nature of software development means most work is knowledge work, and is therefore hard to observe. It is easy to monitor activity (how much time they sit at their computer) yet it is hard to observe the value they produce (useful software that meets a real need)." - Patrick Kua

So what do we do? Get rid of the metrics and go back to relying solely on the subjective judgment of individual managers? That doesn't seem satisfactory to me.

Why use metrics?

When it comes to engineering management, I see two main use cases for metrics:

  1. As a feedback tool

For me, a big part of management is about establishing and working with feedback loops. Not just the interpersonal feedback between people, but also the information that engineers get from their tools, from telemetry data, and ultimately from business results. By the same token, I think teams and their managers should make sure they get feedback on how well they operate as quickly and transparently as possible. In my experience, discussions about work processes in particular are often dominated by opinions, personal preferences, or simply a reluctance to change habits (because there are so many other, seemingly more important things to do). I think carefully selected metrics can go a long way toward removing subjectivity from these conversations. They help to identify potential problems early, to recognize facts and patterns that might otherwise be missed, and to assess the impact of important decisions or the outcome of an experiment.

In this way, I see metrics as similar to the instruments on your car dashboard: you can certainly drive without a speedometer, a gauge for remaining fuel or battery life, or information about whether the lights are on. But I'd argue that most people would prefer to have them available - I certainly would.

2. For goal setting

The second way you can use metrics is the much more delicate one, because it's so easy to misuse them and actually do more harm than good. But used wisely, they can actually be really helpful in clarifying goals and making progress toward them measurable. In my experience, this kind of alignment between management and teams is one of the key enablers of team autonomy. Unfortunately, it is surprisingly easy for the two sides to misunderstand each other and interpret the same written goal in very different ways. Agreeing with the team on how to measure progress toward the goal is a really powerful way to avoid these kinds of misunderstandings. And if you choose metrics that allow you to track your progress along the way, you'll also gain quicker insight into whether the solution you envisioned is actually going to fly.

So what's the problem?

The main problem with management by numbers, and the main reason why metrics get such a bad reputation in this context, is that all too often metrics and targets are set in the ivory tower. Without the involvement of the people who actually have to achieve those goals, context is lost on both sides, and the metrics easily become the goals themselves.

This is problematic on a number of levels. As Goodhart's Law states:?"When a measure becomes a target, it ceases to be a good measure."(2) Without understanding the actual goal, teams are incentivized to meet the set number in any way they can. This opens the door to prioritizing short-term gains over sustainability, or even actively gaming the metrics. A target that doesn't seem to make sense or is virtually impossible to achieve is likely to lead to frustration and, ironically, less motivation to work toward it. Focusing on the metric instead of the goal also discourages discussion about whether the metric actually reflects the desired objective. As a result, teams may end up hitting their targets and still fail to generate significant business value, which in turn will lead to a loss of confidence in metrics in general.

I don't think it has to be this way.

What is important to keep in mind when working with metrics: A particular value for a metric is not in itself good or bad. Coming back to the car metaphor: the question "Is 60km/h the right speed?" cannot be answered generically. If you're on a wet, dangerous, curvy road, it might be too high; within a city, it almost certainly is - unless you're steering an ambulance in an emergency situation. On a highway, 60km/h is probably a bit too slow, but that depends on the weather conditions, traffic and a lot of other factors. An almost empty fuel tank / battery is no reason for concern if you're a few 100 meters from the next charging stop, in the middle of nowhere it's a different story. The point is: context matters. A lot.

That's why I would never treat metrics by themselves as direct performance indicators or alarm bells. Anomalies or unexpected values can be a signal that something might be worth taking a closer look at, but that closer look needs to be taken together with the people who have the most context - the team. Maybe there's a simple explanation that requires no intervention at all, maybe there is an issue but it's outside of the team's scope (and thus on you to take care of). Or maybe there's indeed something that needs to be addressed within the team, but again: the action shouldn't be imposed from the outside, but worked out together. The same goes for goal setting. There should always be a conversation and an agreement with the team about the target, and they should have the opportunity to suggest alternative ways of measuring. In both cases, I see the conversations as the far more important part, the metrics are just the catalyst that gets them started and informs them.

What to measure?

Which leaves the question: What exactly should we be measuring? Fortunately, we now have at least some research on the subject. The Accelerate / DORA research(3), Google's re:Work study(4), and the SPACE framework(5), to name a few, are all worth a look.

I think the SPACE framework in particular is a good place to start because it looks at the issue holistically and suggests a variety of metrics, making it easy to adapt to your specific environment. I also like its recommendation to implement metrics that are in tension with each other, thus avoiding over-optimizing one aspect at the expense of another. If you also take care to mix qualitative assessments with quantitative technical metrics and business numbers you should get a pretty solid foundation that you can then iteratively improve over time.

Because it's so context dependent, it's hard to suggest concrete metrics that will work most of the time. That said, here are a few that I find valuable:

  1. To measure team workflows and well-being:

Both, SPACE and re:Work suggest that personal happiness, psychological safety and team interactions are important factors for team productivity. To cover these, as well as to get a subjective assessment of the overall quality of?work, I think regular team health check surveys(6) are a really good tool.

2. Agility metrics:

As said, productivity is difficult to measure because we can't simply rely on activity metrics. Instead, I suggest focusing on metrics that reflect working in small batches, as this gives you an assessment of how quickly you can generate business value (assuming you're working on the right things) and how quickly you can adapt in case you realize that you're on the wrong track. Metrics that cover this aspect for me are

- Lead time / story completion time

- Average lifetime of branches

- number of key learnings per cycle(7) (I really like this one)

3. Quality metrics:

I truely believe Dave Farley when he claims that there is no inherent trade-off between quality and speed(8). Nevertheless, I think that less mature teams in particular benefit from measuring the quality of their work to balance out the metrics that incentivize agility. One obvious metric here is the number of defects found in production. If possible I suggest you complement this with some technical quality metrics like e.g.

- cyclomatic / cognitive(9) code complexity,

- warning counts

- build / deploy times

- test coverage

My colleague Evgenia recently made an interesting suggestion in this regard. If you measure the perceived quality of code reviews, you get a metric that reflects the quality of new code added to the codebase. Since this isn't too much affected by the overall state of the existing code, it might be a better quality metric for large legacy codebases than some of the above.

4. For measuring business value

Since the financial outcome is an extremely laggy and inaccurate metric (which part of your work contributed how much?), I suggest to measure some form of customer engagement metric, e.g.

- number of weekly / monthly active users (ideally measured by cohort(10))

- average session length

One piece of advice I've heard from data experts is to first understand the question you're trying to answer, and then choose the things you want to measure accordingly. I think this is sound advice in terms of the effort required to implement and analyze the data. On the other hand, I've been in a situation several times where we agreed that we'd expect to see an improvement in a specific metric. But unfortunately, we weren't tracking that metric yet, and so we didn't have a baseline and couldn't use it.

So don't overthink this. If there's a good reason to believe that a certain metric will reflect a particular aspect well enough, then go ahead and implement it. You can always improve it later.

How to implement

Moving to a more metrics-driven engineering management is a cultural change, therefore it is inherently difficult. If you haven't used metrics for this purpose before, you face a chicken-and-egg problem: many of your engineers may never have seen the benefits and may be very skeptical about being measured. So they resist the idea of metrics, which in turn makes it difficult to implement them and demonstrate their value.

A relatively low-effort approach that has a good chance of being well received is to start by conducting regular team health checks, especially if you use the results to have meaningful discussions with the team about how to address aspects that were rated low. Then look at your existing tools and see what metrics they already provide (almost) out of the box. Hopefully the product managers will also be able to give you access to usage and business data that they track anyway. All of this combined should give you a pretty good starting point.

Once you have a few data points available, make it a habit to review them regularly with the team. This may feel a little uncomfortable or even useless at first, so come prepared for those first conversations. Chances are that you will already be able to spot patterns or quirks that deserve a closer look. Point these out to the team and engage in a conversation. Many people are visual, so creating a dashboard with nice graphs can also go a long way to getting them interested.

After a while, you should also start asking what the team thinks would be useful things to track in addition. But stay away from the activity-related metrics in the beginning. They are so easily misinterpreted (especially by colleagues from other functions) and could send the message of 'more rat-race'. In the same spirit, I'd also recommend waiting until the metrics have become 'normalized' before starting any discussions about specific targets.

Also keep in mind that many of the metrics you track will still be somewhat lagging. For example team morale is something that will only improve gradually after the root cause has been addressed. So don't fall into the actionism trap and give your mitigations a bit of time to show an effect.

Summary

Management by numbers is not a silver bullet. Just as you can't A/B test your product to become something as innovative and iconic as the iPhone, I don't think you can create a healthy and productive engineering organization based on numbers alone. You need human empathy and intelligence to do that. And yes, in theory, you and your teams could observe most issues without using metrics. In reality you are usually busy and often also 'operationally blind', so you may not notice a problem until it becomes significant. I believe that having metrics as one of your regular inputs to discussions can help a lot here. They can also support you and your teams clarify goals and track progress in a meaningful way. So, as long as you use them in a thoughtful and collaborative way, I think metrics can be an immense help in creating a better working environment for the engineers, so I suggest you give them a try.


  1. https://martinfowler.com/articles/useOfMetrics.html
  2. https://en.wikipedia.org/wiki/Goodhart%27s_law
  3. https://cloud.google.com/blog/products/devops-sre/dora-2022-accelerate-state-of-devops-report-now-out
  4. https://rework.withgoogle.com/blog/five-keys-to-a-successful-google-team/
  5. https://queue.acm.org/detail.cfm?id=3454124
  6. https://engineering.atspotify.com/2014/09/squad-health-check-model/
  7. Tom Chi - Rapid Prototyping & Product Management
  8. https://twitter.com/davefarley77/status/1615030951330840579?s=20
  9. https://www.sonarsource.com/docs/CognitiveComplexity.pdf
  10. https://gopractice.io/product/how-engagement-metrics-can-be-misleading/

Sven Müller

Impactful Product Development Teams + Org Designs that scale without having to make another painful reorg.

1 å¹´

must read article

要查看或添加评论,请登录

Manuel Drews的更多文章

  • Constraints

    Constraints

    I recently heard the story of the “K?ln Concert” by Keith Jarrett. In a nutshell, it goes like this: jazz pianist Keith…

    5 条评论
  • A fine line

    A fine line

    If you've ever been part of a sports team, you'll know the saying "the team wins, the team loses". The idea behind this…

    9 条评论
  • Métaphores dangereuses

    Métaphores dangereuses

    I recently read a piece of feedback that contained a statement along the lines of 'this team is running like a…

  • Beware the White Knight

    Beware the White Knight

    Product development is an inherently messy business. I can say that from my own experience but it seems to be true…

  • The myth of full utilization - Part 2: Flow efficiency

    The myth of full utilization - Part 2: Flow efficiency

    In my last article I discussed several reasons for why I think that parallelizing software development work at all…

    2 条评论
  • The myth of full utilization - Part 1: The downsides of parallelizing software development

    The myth of full utilization - Part 1: The downsides of parallelizing software development

    It sounds like a good idea: "We try to work in parallel as much as possible to be as efficient as possible. We want to…

  • Developing a feedback culture - Part 4: "Creating culture"

    Developing a feedback culture - Part 4: "Creating culture"

    Imagine a workplace where you can talk to any colleague, even the CEO, whenever you feel you have something to say…

  • Developing a feedback culture - Part 3: "How to receive feedback"

    Developing a feedback culture - Part 3: "How to receive feedback"

    In the previous article I've talked about how to give feedback. Obviously there's also the other, the receiving side.

    2 条评论
  • Developing a feedback culture - Part 2: "How to give feedback"

    Developing a feedback culture - Part 2: "How to give feedback"

    How feedback is given and received plays a crucial role for how effective it will be. Neither of the two is trivial, so…

    2 条评论
  • Developing a feedback culture - Part 1: "Why?"

    Developing a feedback culture - Part 1: "Why?"

    'Good Feedback is one of the most effective management tools for a supervisor'. I learned this in my first training…

    2 条评论

社区洞察

其他会员也浏览了