Creating and Using Quality Metrics in Software Delivery - Part III

Creating and Using Quality Metrics in Software Delivery - Part III

In the previous articles, I wrote about some challenges associated with poorly chosen metrics and an over-reliance on outcome metrics. The latter tend to provide feedback far too late in the development cycle. Ideally, we also want metrics to tell us what is happening well enough in advance of a release to afford us the opportunity to take corrective actions. The advantage is simple: if we can identify potential problems in CI or testing, we can hopefully prevent deploying defective software to our customers. Finding issues in the development stage allows us not only to fix problems but also gives us the possibility to identify and address potential problems with our architecture. These two benefits are reason enough to adopt a meaningful, if not comprehensive, set of input metrics.

Technical Debt

The primary reason we want to monitor our development processes is to discover activities which may impact the quality of our released product or service: high-risk commits, rising code or architectural complexity, or emerging hotspots. Further, we want to find these things as early as possible - preferably in real-time. The above are all examples of accumulating technical debt. This is our real enemy. Long-term deferment of paying down technical debt is the scourge of modern software development. Dan Radigan of Atlassian called it a black hole.

Technopedia says technical debt is "the implied cost of additional rework caused by choosing an easy solution now instead of using a better approach that would take longer." Technical debt is an actual cost, and deferring repayment accumulates interest. It is not some abstract concept which may or may not impact your product. It is real. In some ways, you can think of it like a bank loan: the longer it takes to repay, the more interest accrues. At some point, your interest payments in terms of effort to repair become so large that it endangers your business. Also, like a bank loan, no single individual in a company can take out a loan without coordinating with the Board. Here is where the similarity ends. Unlike a bank loan, with technical debt, almost anyone involved in the development process can, and usually does, assume debt on behalf of the company.

But all technical debt is not equal. Having technical debt is, by itself, not particularly problematic. Think about buying something on your credit card: if you pay it off in 30 days, all is good. What is dangerous, however, is to put off fixing something in your code. If you leave problematic or risky code in your codebase, there is a good chance that you'll build new code (a module or service, for example) which relies on the risky code. You'll be building good code on top of bad. It could be that this is a risk you are willing to take. But, over time, you might forget the risk is there and debugging your code will become a nightmare. Historian Michael Howard wrote, "all we believe about the present depends on what we believe about the past." If we apply this idea to software and ignore the accumulated technical debt in our code base, we will probably believe our code is more stable and secure than it is. Better to find and address the risks early. Before we forget them.

So, how does one go about choosing and tracking meaningful input metrics? The correct ones for your organisation will vary depending on your application, infrastructure and architecture. Luckily, we can track some fundamental metrics to help us, and if we act on them, we will pay down our most dangerous technical debt.

Monitoring the development processes

Input metrics can be used to find and prioritise work to reduce risks to the delivered product or service. Input metrics can also be used to identify weak areas to improve code which may work fine for now but has a chance of becoming a severe problem later: paying down the above-mentioned technical debt. For the former, it should be evident that there are real benefits to knowing exactly where you are in software quality and delivery before the end of a project or delivery to production. For the latter, knowing what refactoring to prioritise will be the key.

There are a myriad of ways you might go about monitoring the quality of your code, but here are a few suggestions: of things to track

  • Percentage of new or reworked code covered by unit and integration tests. If you implement a preferred practice that encourages or requires any new code written or old code rewritten to have tests, you will go a long way to finding problems now and regression issues later. For new code, test-driven development implies that you write your objectives as tests and then write your code to meet the objectives, i.e. to pass the tests. For code that is reworked to fix a bug, you have, by definition, a problem. What better time to ensure the problem does not recur than to write tests for it. Be careful not to create happy path tests or only unit tests. These may make you sleep better but won't necessarily make debugging any easier when another issue appears.
  • Code and author churn. Both code and author churn are measurements that help identify potential risks in your code. Highly volatile code is often a warning sign. Essentially, you assume that the more code is changed, the higher the risk that something will go wrong. This is the opposite of "don't touch a running system." Any file being changed regularly or aggressively should be considered a risk. The more often, the higher the risk. And when more than one author works on the same areas of the code, the risk increases even more.
  • Code complexity and hotspots.?Cyclomatic complexity is a well-defined concept and can be visualised using tools like Codescene, SonarQube, Coverity, or others. Each addresses the challenge in different ways, and though, in some cases, they complement each other, look closely at what each provides before you choose. Each of these tools measures complexity. In Codescene, for example, complexity is measured and superimposed against the activity. This is meant to identify potential hotspots in your code. Hotspots are calculated by measuring the changes to a file over time and correlating them with a McCabe complexity score. This allows development teams to identify and address the highest risk and most volatile code and insert it into the backlog. In this way, refactoring is an ongoing task.
  • Code Toxicity. Toxicity is a measure of code with poor internal quality and is hard to maintain or extend. Set thresholds for common coding problems you wish to monitor, for example, file length or the depth of nested if or try statements. These are relatively easy to find and track with standard static analysis tools.

Trends over Time

Once you decide what metrics to watch, you must understand what is happening over time. It is important to remember that input metrics are not always good indicators of quality or risk at a particular point in time. Instead, the results should be analysed over a period of time to determine what is happening.

A good example is code complexity. One can plot a complexity trend over time. The graph below clearly shows that this code's complexity began to grow rapidly in October. It is also growing non-linearly as the lines of new code increase. This often means that the code will become harder and harder to understand over time and could become high-risk technical debt. It should be watched carefully and identified as a good candidate for refactoring.

No alt text provided for this image
Figure 1 A complexity graph for a file showing the relationship between lines of code and complexity.

Active instead of passive code analysis

To find high-risk technical debt in your code, you can run a nightly analysis of your codebase or, even better, during commits. As I mentioned earlier, a hotspot is?complicated, high-risk code that your developers are actively working on. [1] Once identified, hotspots can be monitored and discussed regularly for inclusion in your prioritised backlog.

But code health can also be monitored in the CI build and test phase. The Codescene example in figure 2 is a good example. The file already has a McCabe complexity of 68 and a high degree of code duplication, meaning that many functions are similar and can probably be expressed using shared abstractions. If a developer commits additional code to this file (or changes it), you would want to know that it is a potentially high-risk commit.

No alt text provided for this image
Figure 2 A graphical representation of hotspots: code complexity with code change frequency and the complexity trend. The larger the circle, the higher the complexity. Darkness indicates the level of activity.

Code Health

Another value offered by Codescene is an input metric that measures the quality of your code and the risk implicit in the code itself. Though a subjective measure, according to Adam Tornhill, the creator of Codescene, code quality fills several vital gaps in code analysis.

  • Bridge the gap between developers and non-technical stakeholders: A visualization of risk provides information to managers that help decide when to take a step back, invest in technical improvements, and measure the effects.
  • Get immediate feedback on improvements: The code health score trends give you immediate and visual feedback on your investments in refactoring.
  • Share an objective picture of your code quality: The code health scores are based on baseline data from thousands of codebases. The code is scored against an industry average of similar codebases.
  • Get suggestions on where to start refactoring: The code health scores hint at specific problems in each file, suggesting which refactoring could be used to address the findings.

No alt text provided for this image
Figure 3 Code health trends from an open-source project

Code analysis tools have been around for a long time and should be part of any developer's or team lead's toolbox. When using code analysis tools as the source for input metrics, try to follow two basic rules:

  1. use active measures (those that can be included in the CI build, for example)
  2. use measures that are good predictors of risk (poor code that is not being actively used or changed may show up in scans, but it may not be your biggest problem).

Toxicity

Toxicity is a measure of code with poor internal quality and is hard to maintain or extend. In many ways, this input metric is not a single measure but a collection or index of several metrics. Nevertheless, many teams find it useful, so look it over and see if it can help your product.

Toxicity in code can be indexed based on aggregated measures of common problems.[2]

No alt text provided for this image
Figure 4 The above table shows how toxicity might be measured. It shows metrics that make up a toxicity score and some example base thresholds on which the multipliers are based. [3]


Input Metrics from a Value Stream Mapping

I want to describe one final set of input metrics briefly, which comes from the Lean concept of value streams. The non-value added or waste time can be measured in real-time to give you a point-in-time view of development progress. Each gap (wasted time or hand-offs) in the value stream map can be used as a metric to gauge efficiency.

No alt text provided for this image
Figure 5 A typical Value Stream Map (current state) for a set of features

The gaps in the lower segment of the VSM represent wait states and often indicate extended handover times. These times impact your overall delivery times and must be reduced or eliminated.

You can also calculate and monitor summary metrics based on a Value Stream Map. These include Total Process Time, Activity Ratio, Total Lead Time and others. If you are unfamiliar with Value Stream Mapping or want to learn how to apply it to your development processes. In that case, I recommend you read Karen Martin and Mike Osterling's Value Stream Mapping: How to Visualize Work and Align Leadership for Organizational Transformation, 2014.

Continuous Monitoring

We have become accustomed to DevOps and Operations using real-time monitoring to assess the health of a production application or platform as input metrics. We should apply this same paradigm to the development process. We can do this by integrating and monitoring specific metrics directly in our CI and build systems for indications of impending doom. Indeed, all the tools I cited above can and should be integrated into the build process. Suppose a developer or test specialist can get real-time feedback on commits or tests. In that case, if, for example, it is to an already high-risk file - they can react early enough in the development cycle to minimise disruption and cost. When used effectively, these input metrics will help you identify potential risks in your product code and also help you prioritise and pay down technical debt.

A final word on metrics. Metrics cannot be the goal. Good metrics only provide hints to where problems may exist and help guide us to see risks. More important is to evaluate the processes and activities behind the metrics and ultimately to engage with the people working in your team: product owners, developers, testers, release managers, DevOps, and management. No tool or metric is going to solve process or behavioural problems. They can, in most cases, illuminate the problem to allow you to take action.

Note: As with any outcome-based metric, an input metric (or, more accurately, input measurements) is most useful when applied to teams, services or programs. They should never be used to compare individuals. [4]

[1] https://codescene.io/docs/guides/technical/hotspots.html

[2] https://erik.doernenburg.com/2008/11/how-toxic-is-your-code/

[3] Ibid.

[4] John Seddon, an occupational psychologist, researcher and professor.


Johannes R?del

Software Architect at APIS Informationstechnologien GmbH

6 年

Hi Peter! Great articles on this complicated subject. Thanks!? I have a editorial note though.? Figure 2 is not showing up. The link to your github account may not be correct.

回复
Tobias M.

Digital transformation, in a secure way...

6 年

Hi Peter, Thanks a lot for this series of articles about metrics. Me as a Test Manager (if there is something like this now-a-days) and even more as a Test Mentor like very much the power of metrics to control your test activities and the quality of the product. Nonetheless, I know that they only serve as an indicator, to let you know that you might need to analyse a situation further. However, there is one point which I don‘t understand that well: The difference between Outcome, Input, Output and Activity related metrics. In Part I I thought there was a typo and you meant Input wheneven you said output. But then you did the same in Part II and III and I was lost again. What I thought I understood was: Outcome means to measure the product. And activity to measure the effort/input/activities/resources I needed to achive the product as it is. But after all your articles I‘m afraid that I missed something and that Outcome, Output, Input and Activity based metrics are kind of four different dimensions. Can you please clarify? If not in writing, maybe during this years Agile Testing Days. Thanks a lot and regards, Tobias

回复

要查看或添加评论,请登录

Peter Caron的更多文章

  • Optimise For Efficiency, Not Hiring

    Optimise For Efficiency, Not Hiring

    Absorption Rates in Technical Recruitment Have you ever heard of the concept of an absorption rate in hiring? An…

    5 条评论
  • Release Orchestration

    Release Orchestration

    This document is the companion article to the presentation given to Solutions Hamburg. 12 September 2018 Introduction…

    3 条评论
  • Creating and Using Quality Metrics in Software Delivery - Part II

    Creating and Using Quality Metrics in Software Delivery - Part II

    In the first article in this short series, I attempted to show how metrics can be manipulated to distort their…

    1 条评论
  • Creating and Using Quality Metrics in Software Delivery - Part I

    Creating and Using Quality Metrics in Software Delivery - Part I

    In the 1990s, physicist turned management guru Eliyahu Goldratt mused “if you tell me what you are going to measure…

    3 条评论

社区洞察

其他会员也浏览了