How to measure your service

How to measure your service

We often hear that metrics are very important for services. But let’s think: why is it really necessary to check them? We should monitor metrics to improve our service. Based on this, we can make hypotheses for improvement. It is very important to implement it into your continues improvement, PDCA cycle or whatever you choose to call it.

Kanban suggests basic metrics that show the effectiveness of service and ways to measure them. I’ll explain which metrics I check as a delivery manager to monitor the quality of my service.

I develop (as a product manager) a Jira-integrated system that displays basic metrics in different reports to assess the health of the service. Flexible filters and settings help us adjust most important points of view. For example, we can choose between the full technical department or separate teams, set the time interval, or labels that we use to include or exclude certain features.

In my opinion, one of the most important metrics is a lead time. And we focus on it in many charts. We start measuring lead time, when anybody in the team ?touches? the feature, and stop to measure it when our customer receives his full value. So our lead time includes Discovery (4 different statuses), Delivery (4 different statuses), AB-test (3 different statuses) and even PostProduction.

Lead time histogram:

So the first chart we created was lead time histogram. Its main goal is to show how many feature we release in a definite period of times. We can use this histogram in different ways. First of all, If you see different peaks in the picture, it may indicate different classes of services. Secondly, this chart shows how well your system handles deviations. If you see long or thick ?tail?, it means that your system is unpredictable and it works with deviations badly. On the other hand, it is normal to have some not huge tails, because in certain cases you can spend more resources (time or money) for struggle with deviations than this deviations are actually worth.

Moreover, on this chart we can find different percentiles of a lead time. In our case 85th percentile is the most important because it separates influence of unwanted tails and shows honest picture of our service. Big difference between percentiles shows weak predictability of the service. So we should monitor it too.?

Picture 1.?Lead time histogram (team #1)


Picture 1.?Lead time histogram (team #2)

For example, in Picture 1, we can see that team (team #1) works good with their ?tails?, and it doesn’t have any strong deviations. But the team (team #2) in Picture 2 doesn’t have obvious peak but has a long tail. So it is harder to predict its lead time. But what does it mean? It means that we can’t confidently tell our business how much time the second team should spend on the next feature using this chart. But we can normally predict a lead time of future tasks for the first team with 85% confidence. Next pictures will show second team’s results in different charts.

Lead time BoxPlot:

Picture 3. BoxPlot

After we’ve analyzed lead time we can focus on statuses that influenced it. The chart in Picture 3 shows 1st, 25th, 50th, 75th, 99th, 100th percentiles for each status. Its values includes two parts.? First of all, you can discover that you spend more time on some statuses than you expect. This is typical situation for a buffer status. Often you don’t notice how long they take time. However, this has a strong influence on lead time. It doesn’t matter whether you are actively working on this problem or just waiting for something; your customer is still awaiting a feature. Long buffer status can be explained by the next status, which may be a bottleneck in your system, and you might need to optimize it. I can’t say that Picture 3 shows huge buffer statuses, ?Development:done? is the largest, but it isn’t excessively long, which indicates good effectiveness.

Secondly, you can pay attention to the ratio between different statuses. Based on it, you can make management decisions. For example, if you see that you spend more time on testing than on developing, it may push you to rebalance resources.?

And the last but not least. As in a lead time difference between percentiles shows predictability. If you see huge difference between percentiles, it means that there is a long ?tail? in this status and you can’t predict cycle-time. In our case, there are unpredictable statuses such as "Solution Discovery: In Progress," "Development: In Progress," and even "Feedback" (which isn’t included in lead time). Looking at Picture 3, we can suppose that in 75% of cases, we discover/develop/give a feedback really fast. However, the last 25% of features may take significantly longer.

As a bonus, this chart allows you to identify statuses that you no longer use, prompting you to consider workflow optimization.

Quarter report

Fine! We’ve checked the current lead time value. But how can we determine if it’s good or bad? Maybe we should compare different teams? That’s bad idea because different teams has different stakeholders, features, contexts, etc… Instead, we can compare our service with… our service in the past. We suppose that quarter dynamic can show tendency. However, this timeframe works good only with our duration of feature development. If your tasks are smaller or larger, you might need to adjust your timeframe accordingly.

We can check cycle times in quarter report too.

Picture 4. Quarter report

In Picture 4, we see an increase in lead time (85th percentile, truly) related to increase of time of development. Really if we refer to Picture 2, we can identify several large tasks that contributed to this increase in lead time.

Control chart

Picture 5. Control chart

Another way to measure dynamics is by using Control chart. This allows us to assess the predictability of our service and identify signals indicating increases or decreases in system entropy. In Picture 5, the blue line shows average value in each period of time and pink area shows the standard deviation. A thicker pink area shows the worse prediction. We can see that the third quarter is less predictable than the previous period.

However, it's important to approach this chart with caution, as it is most useful for services that produce very similar products, such as in factories.

Cumulative?Flow?Diagrams

Picture 6. CFD

Apart from Quarter report Cumulative Flow?Diagram shows dynamics. I’m pretty sure that this kind of report is one of the most important but it will be too long story for this article. Maybe one day I’ll write article that will be focused on this diagram only. ?

Table view

Picture 7. Table view

After reviewing the global metrics, we should dive deeper into local problems.?

To do this, we need to focus on each problematic feature. Typically, I check features that have been in any status for more than 25 days OR where we’ve made a prediction error greater than 30%. This can be identified on the observation chart.

For each problematic feature, I concentrate on the tasks within it, their workflows, the overall feature workflow, blockers, and so on. Additionally, I utilize blocker statistics to analyze how frequently we encounter blocks, the reasons behind them, and their duration.

Features stacked bar

Picture 8. Stacked bar

I can’t say that this is the most important chart. This is the way to find the same local anomalies like in the previous chart. But sometimes it is a bit more convenient. ?

How to compare Story points and Days?

We experimented with measuring story points and comparing them to the time spent. In my opinion there are no values in it. We tried to find correlation between:

  • Duration and estimation in story points
  • The number of days used per story point and the overall feature estimation
  • Estimation errors and estimates
  • The frequency of deadline changes and feature estimation

The resulting charts indicate no (or weak) correlation among these measurements. I believe this is due to errors in estimation and the complexity of the tasks and environment we operate in.

Throughput

Picture 9. Throughput

I rarely analyze low-level tasks, but the throughput of TASKs (not features) is one of the health metrics I monitor. The chart in Picture 9 shows how many tasks our team completes each week. If we notice any anomalies, it signals the need to investigate the underlying reasons. Sometimes, I also check Task’s status BoxPlot (as shown in Picture 3). We also have other different charts, but in this article I tried to tell you about most convenient of them

In conclusion I would say that there are many difference ways to measure your service, and each report and metric can influence to different conclusions in different cases. But it is very important start measuring your service. Because you are blind and made decisions just based on your intuition and experience (that can be not fit for all cases) without it.

要查看或添加评论,请登录

Denis Nikolaev的更多文章

社区洞察

其他会员也浏览了