The Technical Debt Framework. A practical Ten Step Approach to Escaping the Blackhole of Technical Debt
Background
This year we started an initiative to identify and measure Technical Debt. To help teams manage such debt, we needed a metric, a number, a KPI, to measure it, and help the teams manage such debt. This initiative is driven by our Enterprise Architecture team, we collected feedback through a survey and feedback from our tech teams as well as collaboration with our peers in the industry and Gartner consulting. This article describes this Framework. If you would like to implement such framework and share your findings, then please let us know.?
We had previous efforts focusing on agile team setup, product led, Flow Framework implementation, and this enabled us to accelerate the delivery in several areas. However, a common observation was that this also resulted in an increase in technical debt. Or at least this is what several team members indicated. In a data driven organisation, it is hard to confirm this without having a KPI to track. We implemented FlowFramework across several domains, but the Tech debt was always an indication of how much of the team’s capacity was allocated to Technical Debt in past sprints, and it did not answer: How much debt do we have now.
There was also a lack of consistency among teams on what constitutes "technical debt". Some teams included items such as "Make sure the testing is completed", others included bugs. Some focused only on bigger items. If we want to have a companywide framework to identify and measure technical debt, we need to be consistent.?
There is a negative connotation to "Technical Debt", almost a source of shame! Sometime items were logged as "Improvements" instead of Debt. Some teams included them as bugs because bugs are more acceptable result of software development or system malfunction, at least they are better than Technical Debt. I'm not sure where this connotation came from, as I never witnessed any negative consequences for trying to address technical debt.
These are sometimes disputes within teams. Product Owners vs engineers on categorizing and prioritising tech debt; product owners own the backlog and in prioritize work in sprints focusing on creating value for the organisation. While engineers/architects also want to maximise the value of their work by future proofing it, and removing as much workarounds and impediments to future deliveries. It would be great if a framework can help these teams agree the value of removing technical debt and illustrating the impact of such effort.
Finally, most people don't appreciate manual effort. So, if there is a way to automatically track technical debt, without requiring significant manual intervention, better yet, if there is a way to proactively predict Technical Debt then that would be a massive plus.
This Technical Debt Framework should help us answer two critical questions:
How to quantify Technical Debt, and
How to help the teams prioritise and eliminate them when needed
What is Technical Debt?
There are several definitions, they tend to be in the IT field. Here are some of them:
From Wikipedia: Technical Debt "is the implied cost of future reworking required when choosing an easy but limited solution instead of a better approach that could take more time"
Also check Atlasian's definition (https://www.atlassian.com/agile/software-development/technical-debT):
"Technical Debt is the difference between what was promised and what was actually delivered. This includes any technical shortcuts made to meet the delivery deadline
?Here is another view from Gartner (https://www.gartner.com/en/information-technology/glossary/technical-debt):
"Technical debt is accrued work that is “owed” to an IT system, and it is a normal and unavoidable side effect of software engineering"
Ward Cunningham introduced the concept here: https://c2.com/doc/oopsla92.html, this is the earliest I could find.
Nowadays you find examples entire systems labeled as technical debt such as here: https://www.verdict.co.uk/us-airport-glitch/
We ran a survey among our tech teams who create digital solutions for all digital channels. One of the questions was:
If you were to put the definition of Technical Debt,what would that definition be?
To develop a comprehensive framework for an enterprise, we grouped the definitions above as well as all the responses we have received from our survey to one of these three buckets:
Is Technical Debt a Tech (or IT) only issue?
In our survey, we used this funny cartoon (courtesy @vincentdnl) throughout our survey, presentations and communications. Our initiative actually started when one of our Product Owners was asking the exact same question “I don’t understand why it takes so long to add [this new feature]
A product team owns the product (The house), the team includes engineers as well as product owners, and are all responsible for the product they create. Technical debt can impact the quality of the product as well as lower the productivity of teams as they spend their time working around the debt. Therefore, Technical debt is a Product Team issue, and not only a Tech issue. It is that team’s issue.
Flow Framework
The Flow Framework, originally introduced in Dr Mik Kersten’s book “Project to Product” describes five data points that track the business value delivered by every value stream, product or initiative. These data points allow teams to track their ‘flow’ and identify areas of improvement.
?
The flow metrics are:
Here are the flow items that are used in the Flow Distribution*:
For our technical debt framework, of course we would use the "Debt" distribution metric, and it is important that a good understanding of the Flow Framework, especially Flow Distribution is implemented.
?
Here is an implementation from one of our agile teams where they consistently dedicate around 5% of their capacity to eliminating "Technical Debt"
The Tech Debt Quadrant 2.0: Is all Technical Debt Equal?
To answer the question "Is all Technical Debt Equal?" we leverage and enhance Martin Fawler's TechnicalDebtQuadrant to include the examples we came from internal feedback as well as peer collaboration.
The x-axis now shows a "Known" vs. "Unknown" impact of the Technical Debt, while the Y-axis shows a range from planned (Or intentional) to Unplanned (Unintentional).
Q1. Planned Technical Debt
Similar to buying a house using a bank loan, where an individual would intentionally go into financial debt, and knows the impact of that debt on his/her finances in terms of monthly repayments. This is responsible borrowing. Technical debt in the Q1 quadrant has the same approach: A team must take on a debt to accelerate delivery. They take a considered risk approach and have a realistic plan to pay it off. The plan is documented in the team's backlog and can be actioned with the resources required. If there is such a thing as "Good" or "Responsible" Technical Debt", this is probably it.
?
Q2. Refactors
We've seen several occasions, where a team would use a new tech, or work on a new business model with vague scope, and once the POC/MVP is concluded, they would go through refactoring with a newfound wisdom from their POC Or MVP. That team never intended to take on any technical debt, but they have the understanding of the impact and the skill to pay it off
?
Q3. A Gamble
Sometimes teams take on aggressive scope and timelines and would take shortcuts without sufficient analysis or plans to properly fix the issue, taking a "We'll sort this later" mentality. They have done so intentionally, but the impact might not be fully understood.
?
Q4. Need Assistance
Technical Debt in this quadrant probably built up over time or inherited as a result of an organisational change. This is sometimes referred to as "legacy". The team lack the skill and or knowledge to pay back the debt, and therefore the impact of the debt is unknown and could potentially have severe consequences depending on the function of that system or component.
There is no point beating up the team on technical debt they own in this quadrant. The team should leverage their leadership to seek the help required to de-risk or clear such debt. As a Tech leader, you should be aware of these issues and not label all tech debt as negative.
Using this updated Technical debt Quadrant, every team can plot their Tech Debt items on the relevant quadrant and can have an agreement on the risk and likely consequences if the debt is not cleared, or if clearing certain tech debt items would accelerate the delivery of a required functionally for example
The Technical Debt Framework
As mentioned in the introduction. The objectives of the Technical Debt Framework are:
Implementing the Tech Debt Framework in Ten Steps:
Step 1: FlowFramework is a Prerequisite
Implement FlowFramework, especially Flow Distribution. This is a prerequisite. Also:
?
Important: Don't use Flow Framework or Technical Debt Framework to compare teams. The comparison will be inaccurate, and will discourage teams from leveraging the data to improve
?
Once you have implemented the FlowFramework you should have something similar to this graph:
Step 2: Extend Flow to look at the present/future
You might have noticed from the Flow Distribution graph that it looks at the past. Indeed, this is what that team worked on in previous sprints. What we need to do is look at what is in the team's current Product Backlog. Notice I used "Product backlog" instead of "Sprint Backlog", this is on purpose. The challenge with Technical Debt items is that they could sit in product backlog and never get prioritised for work in sprints. Therefore we need to extend flow to include a view of Flow Distribution of the teams product backlog. This could look something like this:
领英推荐
Now you have some additional insights to derive from every team's backlog. Here are some examples:
·?????? Tech Debt Ratio: The ratio of technical debt to user stories or to the total backlog
·?????? In some cases we saw that a significant amount of bugs can be attributed to a single tech debt item. It would make sense to tackle that rather than the individual bugs
·?????? If the number (and total size) of user stories is very small in comparison to the Tech Debt items then this could mean: Either the Product backlog is not well maintained and/or the team has taken on significant tech debt and now maybe is the time to consider clearing some of it
Step 3: Technical Debt Size Estimation
Clearing Technical debt could require anything from one person and few minutes to several team members for multiple sprints. Therefore, having an effort estimation per tech debt item is required. This could either be in story points, days effort, T-shirt sizing etc. It is also important that the unit is consistent among teams. So all teams should use the same definition of T-shirt sizing or story points.
Some Flow Framework implementations depend on counting the number of flow items instead of using effort estimation. Having estimations will be required for calculating the Tech Debt Score
If you use T-Shirt sizing for estimating effort with Tech Debt items, here is a simple guide:
·?????? Size S: One person can clear the debt in one sprint
·?????? Size M: More than one person can clear the debt in one sprint
·?????? Size L: One person would require more than one sprint to clear the tech debt
·?????? Size XL: More than one person would require more than one sprint
·?????? Size XXL: more than 25% of the team required to work multiple sprints to clear the debt
?
We will see later in the step "automation and predictive tech debt" how an initial effort could be automatically calculated using a default value. The important point here is that the effort cannot be zero for a tech debt item in a product backlog
Step 4: Technical Debt Evaluation - using the Technical Debt Quadrant 2.0
We introduced the Technical debt quadrant above, now it is time to implement it. Evaluate each ?debt item and place it in an appropriate quadrant. It doesn't matter how far it is on the X and Y axis. Simply choose based on the following criteria:
·?????? Planned (Q1): Intentional Debt, and there is a realistic, documented, estimated plan in the product backlog
·?????? Refactor(Q2):? Unintentional debt, lacks a plan to pay off the debt, however the team understand the impact and the risks and can derive a plan when given the opportunity.
·?????? Gamble (Q3): Intentional debt with unknown impact and no plan to pay it off.
·?????? Need Assistance (Q4) aka need help: Unintentional and unplanned with no skill or resources to clear the debt.
To calculate the Technical Debt Score (TDS) we will assign a multiplier per quadrant as follows:
·?????? Q1:? 1.1 multiplier. The 1.1 helps multiply the age if one of those items ages and not cleared.
·?????? Q2: Multiplier of 2
·?????? Q3: Multiplier of 3
·?????? Q4: Takes the biggest multiplier of 4 as it is potentially the riskiest.
Step 5: Tech Debt Age
The longer the debt builds up, the more costly is becomes to rectify and accruing Technical debt causes existing problems to get worse over time. This is caused by several factors. For example, once a tech debt is incurred in a solution, any later modifications or enhancements to this solution must account for, and work around the debt. Leading to a bigger debt. The longer the debt remains, and the more modifications required on that system/code, the bigger the debt becomes. At some point the debt is so big and so misunderstood that it could fall from a Q1 Planned to a Q4 Need Assistance maybe passing through Q3 Gamble on the way. In the VUCA world of Tech, people who would have had a reasonable plan to clear such debt, would probably move to their next adventure, or even the tech would have moved on and plans to clear the debt are no longer viable causing the Debt to land in Q4. Therefore, it is important to include the Tech Debt Age in quantifying the Tech Debt
The Tech Debt age could be measured as the number of days since the creation of the Back log issue with "Technical Debt"
Step 6: Calculate the Technical Debt Score (TDS)
Use this formula to calculate the Tech Debt Score (TDS) per Tech Debt item:
You can then aggregate the total of the TDS's per product as the Sum(TDS in a product backlog)
If your product is part of an organisation e.g. Area or Domain then this could be aggregate and so on until you have a TDS for the entire organization.
You can also look to aggregate it per quadrant, which is probably the smart thing to do, as this would allow you to tackle the riskiest debt better, as well as facilitate collaboration to eliminate Q4 debt for example.
If you are a tech leader, you can use TDS and the Technical Debt Quadrant to help and guide your teams to be sustainable and successful. The framework will help you quantify and classify the risk your team(s) have.
Step 7: Incorporate Tech Debt In your Team Ceremonies
Once Tech Debt Quadrant is used and the Technical Debt Score has been identified per item you should end up with a view, per product, that looks something like this:
The number of debt items on this graph represents the number of Debt items in their product backlog, the quadrant placement came from step 3 and could be an attribute on the backlog item. The size of the bubble is the Tech Debt Score (TDS). So the bigger the TDS the more the issue would stand out. That's the idea at least!?
Utilise this view in your team ceremonies: From planning sessions, review sessions, capacity planning etc.
Step 8: Estimating business risk - Agreeing what to action.
Gartner PAID offers a good approach for agreeing what to action vs what to ignore. Using PAID (Plan, Address, Ignore and Delay), a team would discuss the risk vs the Impact per TDS from the Technical debt quadrant, and agree to address the ones with the most risk of occurring and highest impact.
Make sure this is also part of your team's ceremonies as it provides transparency to the entire team and avoids the frustrations that tech team members feel because their mostly annoying technical debt is not cleared.
Also try to measure the impact in terms of:
·?????? Increased service interruption risk
·?????? Reduced service quality
·?????? Increased cost
·?????? Increased time-to-market
Step 9: Automation - Predictive Metrics & backlog injection
Manually creating Tech debt issues in the product (or project) backlog is probably the easiest way to start using the Technical Debt Framework. The quality of these issues will greatly depend on the diligence of the team. If you would like to save time and add extra consistency as well as end up with higher data quality, then automation will be key. Here are few sources that can predict technical debt items to be added to the backlog:
·?????? Code analysis tools e.g. SonarQube
·?????? Secure code analysis tools e.g. CheckMarx
·?????? Your Enterprise Application Management(EAM) : Identifying end of life components and systems.
·?????? Service Now (Or Jira) for managing operations.
·?????? Container scanning tools
Linking such tools to the team's product backlog will automatically create a Technical Debt issue. For example, if a component is labelled as End of Life in LeanIX, then a tech debt item is automatically created in the Team(s) backlog to update their systems accordingly.
Step 10: Test, Share, and Learn
Finally, try this framework with your team(s), and share your learnings. It is ok to talk about Technical Debt. It is part of Tech! The important thing is to learn how to spot and manage it.
Which quadrant(s) does your Technical Debt tend to be? What trends do you see in the data?
What insights/action can you derive out of that?
If you have read this far, well done! Please let us know if this is something you would try with your team(s), and share your learnings
Sr. Enterprise Architect at Infosys | Unit Head - OSDM | Digital and Cloud Transformation Evangelist | Multi Cloud Architect ( AWS, GCP , Azure) | Seasoned GEN AI Professional | Enterprise Architecture
2 个月Great view , thanks for sharing !
One of the better posts on debt I've read. Well thought through because it ties together lean and engineering practices in a consistent way. Some thoughts to continue the discussion... Gartner's def - non-conformance to non-functional requirements - is also good and can correlate with ISO 9126 or 25010 ontologies. This article makes it easy to envision how elements from SonarQube, SAST tools, SCA/SBoM, and Flexera could feed into this. The only aspect I didn't see is self-healing, for example OpenRewrite. Last, if you could find a correlation on financial performance showing planners outperform need-assistance, that would be very cool.
Director in BCG Digital | Ex C-Level executive | Passionate about solving complex challenges at the intersection of digital & tech
9 个月This looks great Tamer, thanks for sharing!
Director Omni Operations & Order to Cash, SAP S/4 Hana, SAP CAR
11 个月This is a really good piece of work Tamer!
Director of Personalization
11 个月Tech debt blackhole... i need a cartoon of this