Artfully Balancing Technical Debt
Not all debt is bad. Sometimes it's exactly what you want.

Artfully Balancing Technical Debt

Zero debt completely?

Chief executives care about satisfying a delicate balance of interests between various stakeholder groups, including customers, investors, and employees. Focusing on any one of these groups to the detriment of another can have serious negative consequences. Engineering leaders seek to support that balance using the concept of technical debt.

Technical debt is a collection of limitations in your code or system design that carry risk. The debt and related risks may be known or unknown, and may be entered into either deliberately, or inadvertently. It’s a useful strategic tool when used right. Debt is not necessarily bad. In fact, it’s used as a finance instrument by an overwhelming majority of successful organizations. Financial debt is used by 98% of F500 companies to allow investments that fuel growth faster than would be possible if only cash were used to fund such pursuits. In fact, only 10 top companies have chosen to operate debt free, and those companies are not at the top of the F500 list, even though they may generate lots of cash. This makes sense considering that interest rates are relatively low. The sensibility of carrying technical debt also depends on the interest rate.

Natural Tradeoffs

Before we explore the concept of technical debt, let’s take a moment to recognize a few important tradeoffs that affect the speed and of software development and its quality.

No alt text provided for this image

You may recognize the “iron triangle, pick two things” metaphor, sometimes drawn as a three way venn diagram. Although the metaphor is imperfect, it helps us to recognize that as we manage quality, we need to find a reasonable balance between cost, time, and scope. 

It takes time to develop high quality software with full test coverage and thoughtful design. You can control scope and complexity in order to reduce time spent in development. Through practicing modern style software development, the idea emerged that focusing primarily on time reduction affords you the ability to deliver business value sooner, which leads to profits that you can later apply to resource your pursuits. Following this theme, time matters the most, provided we plan to quickly and routinely revisit the levers that drive up quality over time.

Suppose we continue to believe that speed (reduced time) is the most important aspect of our approach, particularly in early phases of building a product. The concept of technical debt emerges as a way to take a deliberate quick solution to a problem to verify business value quickly, and an opportunity to revisit that solution later with a better solution that may take longer to build, but would be informed by some experience of trying something simple first. It’s like taking out a loan that we intend to pay off later when we have a better understanding of how best to solve a problem based on our observations, and when we’ve proven the business value of the solution. In other words, it’s a way of reducing scope for a limited time, so that we can revisit that scope when we are better equipped with experience and resources.

This naive view of technical debt is compelling, and the assumption of technical debt is justified by our need for speed. It turns out that there are actually more types of technical debt than this clearly reasonable one, and failing to manage your entire technical debt portfolio thoughtfully could lead to tragic results.

You can perceive software like a depreciating financial asset, meaning that it has a finite useful life, and requires ongoing investment in the form of refactoring to maintain its value at a consistent level. Others view software like a financial liability, in that all software you build comes with a maintenance burden that must be paid over time, particularly as business needs continually evolve, and the related software grows more complex. I appreciate both of these viewpoints, and like to think of this “decaying software” idea as inadvertent technical debt, described below.

Deliberate Technical Debt

Brand new systems are perceived to have low levels of technical debt, because they are designed and built from well understood requirements to suit a particular purpose. When you make a conscious design decision not to optimize something for an initial release in order to complete that release sooner, you intentionally assume a technical debt. Why design for massive concurrency if we don’t know for sure whether that will matter? If we see weak adoption rates for our software, perhaps concurrency levels will remain low in our service and there will be no need for a robust high concurrency design. Conscious decisions like this usually make sense, particularly at the time the decision is made. Reduced time-to-market in exchange for reduced efficiency, ease of use, or user experience are typical tradeoffs. Engineering teams should be given the freedom to take these tradeoffs in order to strike an acceptable balance between agility and quality. Think of deliberate technical debt like a low interest loan from a financial perspective.

Inadvertent Technical Debt

Over time, the conditions that justified a deliberate technical debt decision may change, and when they do, that debt accumulates risk. In other words, interest is added to the debt. The process of developing software to solve problems gives us a better informed perspective about potentially better ways to solve the problem. This is an example of an inadvertent discovery that if refactored, our software may be more efficient, more maintainable, or better by another dimension of quality.

As time passes, requirements shift. Your software systems may evolve over time as you update them, and design limitations that were not important early in a system’s life become critically important at a more demanding level of scale or activity, especially when complexity of the software materially increases. 

Software defects that have been discovered and worked around, or remain undiscovered, are also technical debt. Where you have complexity, you probably have defects in equal proportions. As complexity increases, you incur additional interest on your technical debts. I refer to the collection of these types of technical debt as Inadvertent Technical Debt. Think of Inadvertent Technical Debt like high interest revolving debt from a financial perspective.

Technical Debt = Deliberate Technical Debt + Inadvertent Technical Debt

The more your deliberate tech debt ages, the more it gradually converts to inadvertent tech debt. Over time, the associated risk of your inadvertent tech debt grows. Think of this increase in terms of interest charged on a loan balance. I refer to this below as your technical debt “interest rate”. The function of the increase will track your scale, activity, and complexity increases over time. If you let your inadvertent tech debt grow too much, it may lead to unhappy developers, and exacerbate through employee attrition. If your engineers leave, along with them goes valuable context for how to effectively pay down your aging debt items.

Technical Debt Bankruptcy

Like financial debt, there is a point at which technical debt accumulates to the extent that an engineering team is completely consumed with reactive bugfix, system repair, and refactoring activities. At this point the team has no time available for feature development, and won’t have that flexibility for the foreseeable future. Whether a tech debt bankruptcy is declared overtly or not, if the condition persists, and the organization needs new features, the solution is typically to replace the decayed system. In many cases a different development team is tasked with replacing the system where the existing “maintenance” team is tasked with keeping the failing system alive. This is extremely demoralizing for the maintenance team. It’s not unusual for members of such teams to fear job security for when the system is replaced. It’s reasonable to expect them to consider new employment options in response to that fear.

In a previous article, I outlined an approach that can be used to modernize an important legacy system without replacing it wholesale. Using that approach will allow you to employ this tech debt management advice offered here so tech debt bankruptcy can be avoided. It may be better to grow the team temporarily, use the Strangler pattern approach to gradually replace it, and allow the original team to own the improved result.

Effectively Managing Technical Debt

As you regularly fix bugs from your Inadvertent Technical Debt pool, and solve problems by adjusting your designs in your Intentional Technical Debt category, then you are “paying down tech debt”. This is where things get interesting. How much technical debt should you carry? Is paying down debt too much a bad thing? 

Software Engineers care about solving interesting technical problems, and not having systems break unexpectedly. Operations teams hate technical debt because it causes unpleasant and often unpredictable work for them in ways they may have little or no control over. Waking up in the middle of the night to deal with a failure from a tech debt item is very frustrating, no matter what your role is. If it happens too much, employees will seek employment elsewhere. On the other hand, if we ask engineers to stop working on features and focus primarily on paying down tech debt, then our business agility slows, and we might struggle to satisfy customers expectations, which in turn may disappoint investors.

Financial debt may be controlled through both internal and external constraints. For example, a borrower may only borrow what a lender is willing to loan given what they know about the borrower and the associated risk. A formula for quantifying that risk may consider the credit rating of the borrower, debt to income ratio, assets, liabilities, bankruptcy history, and other relevant risk factors. Internally, a borrower may have a more conservative perspective on borrowing than their lenders may, so they may decide to borrow less than they are offered. By comparison, technical debt is not subject to an external control by a lender. You’re effectively taking out loans against your own assets using a currency of business agility.

Engineers usually have a sense about whether their tech debt balance is too high. Because there is no external lender involved, it’s easy for tech debt to accumulate beyond your carrying capacity, and may lead to tech debt bankruptcy if left unchecked. An imbalance may become apparent in a number of ways, including:

  • Reduced system reliability metrics
  • Reduced feature throughput (decaying burndown)
  • Increasing technical staff attrition rate
  • Decreased customer satisfaction scores
  • Decreased employee satisfaction scores
  • Increased unscheduled bug report rate

Significant changes in one or more of the above metrics may quickly evolve into a crisis. If the crisis were a financial one, you would consider a range of remedies, from restructuring the debt, a debt consolidation scheme, or a debt reduction scheme. You might consider bankruptcy. Are you willing to sell off your software assets? Are there external buyers who may perceive a net value higher than you do? Would it make business sense to dispose of the asset? If not, you’ll need to use a policy to manage the debt.

A tech debt budget is a policy you invoke to reduce tech debt at the expense of new feature agility. For organizations employing Google’s SRE principles, the budget control is expressed as a feature freeze. The freeze remains in place until a reliability metric returns to normal again. At Google, we use a concept known as an error budget. This means we have a prescribed amount of systemic failure that we are willing to accept for a given system, and if that budget is exceeded, we enter a feature freeze. During a freeze, we focus our attention exclusively on improving reliability. When our reliability metrics return to healthy levels again, we end our freeze, and continue working on feature development again.

Managing tech debt using SRE principles has proven to be an effective approach, but it admittedly requires an alignment among the organization’s top leadership so there is a shared willingness to enter a feature freeze. That’s noteworthy because it may mean you’ll miss a development delivery deadline and customer expectations may need to be reset. This may carry a risk of losing customers with critical dependencies on new features and capabilities. They may choose to source solutions from competitors instead. Confidence in the principle depends on the belief that customer satisfaction with the core product is more important than any new feature or capability you might offer.

So what if your organization can’t arrive at the level of belief that reliability is more important than any other feature? How can you manage technical debt in that case? What’s the optimal balance of effort? Here is a compatible tech debt management approach to consider that you can combine with a future SRE discipline should you decide to adopt one someday:

Catalog Your Technical Debt

Nobody in the business of loaning money does so without rigorous record keeping. Managing your technical debt should be no different. Usually software engineers and their leaders perceive different levels of technical debt because of differences in perspective. Generally speaking, engineers sense much more debt than leaders do. They may not express all of the tech debt they perceive, possibly from a sense of guilt for creating that debt, or a reluctance to name it. Perhaps calling it out might insult one’s technical superiors. The best way to address this mismatch is to quantify the debt with record keeping.

Each software development team should own their own technical debt catalog. I suggest using your bug tracker system for this. If the tech debt item is not already detailed as a bug, add it. Each time you generate new deliberate technical debt, file new bugs to detail the intentional tradeoff, with an outline of how to eliminate it. Initially, estimate the level-of-effort for addressing each cataloged debt item, perhaps in terms of a T-shirt size (XS, S, M, L, XL). 

Once all of your known technical debt is recorded in your various team backlogs, prioritize them using the same processes you use to prioritize feature development. Be sure to get input from your product managers, and even your customers to the extent practical, to help you sort them by priority level.

Consider tracking a few characteristics of each tech debt item, in addition to level-of-effort. Include a numeric priority score (low to high), and a numeric impact or risk score (low to high). Then prioritize these like you prioritize new items on your feature roadmap. I plot them on a scatter chart such that level-of-effort is expressed on the X axis and impact on the Y axis. Items in the top left of that chart first should probably be addressed first, by assigning those the high priority levels.

No alt text provided for this image

I’m surprised by how many development teams don’t actually rationalize tech debt repayment against feature development by employing the input of stakeholders. Product managers may not have adequate visibility into the debt backlog (if one exists), and may set unrealistic expectations for feature development because they lack this perspective. It will help to include your tech debt planning along with your feature development planning work.

Control Technical Debt with TDD

Test Driven Development (TDD) discipline includes a built-in method for paying down technical debt. The “Red, Green, Refactor” approach guides us to first develop a test for a given software capability, which causes that test to go “red” because that new capability does not exist yet. Next, we implement code to introduce the capability, which advances us to “green” by passing the new test. Next is a refactoring phase that improves upon the initial implementation. A great addition to your debt management discipline is to add the refactoring task to your engineering task backlog as you complete the “green” phase. This approach adds a fixed debt payment for every new capability.

The conceptual “interest rate” on that new tech debt will vary based on how much the new code is actually used. If it turns out to be idle code, you may consider it for removal at the time you revisit the refactoring task rather than proceeding with refactoring. Through your priority assignment scheme, your refactoring work will focus on the parts of our code that get the most use.

Pick a Debt Payment Percentage

Ask each development team to agree to a reasonable percentage of each development cycle that will be allocated exclusively to paying down technical debt. Consider a policy that prohibits this value from being under 10% for more than two planning cycles. For example, if you decide to temporarily suspend tech debt reduction for two 2-week sprints, you agree to resume an increased debt reduction rate in subsequent sprints to catch back up.

If you employ an SRE discipline, and you invoke a feature freeze, you’re effectively just temporarily setting the debt payment percentage to 100% with a conditional end time determined by your reliability level returning to the limit your error budget allows.

Consider a 20% starting value for this your debt percentage target. Many development teams have 5 or more members. This means that you can have at least one of your team members focused primarily on the debt reduction effort, and the balance of the team can be focused on feature advancement. Be sure to rotate the responsibility fairly so all members of the team share the debt reduction burden.

If your debt items are complex in nature, and require extensive collaboration to address, you may decide to address those by allocating complete development cycles to debt reduction. For example, for every two regular cycles, maybe you have a third cycle that focuses 100% on debt reduction so your whole team can work together on your complex tasks.

Be careful not to set this percentage too low. Remember I mentioned earlier that technical debt compounds as it ages, like interest. Your debt payment percentage must equal or exceed your debt “interest rate” in order to prevent a downward spiral. Furthermore, the more our debt tagged code is used, the more rapidly its debt “interest rate” grows.

Each individual team should appreciate having both a license and a mandate to reduce long term debt over time. This allows your teams to strike a reasonable balance between long term quality and feature throughput that they can feel proud of.

Suppose you decide to employ both my advice and an SRE discipline together. A carefully managed debt payment percentage will lead to a reduction of feature freeze events. They will become less frequent, and less severe as you burn down your prioritized tech debt backlog. This is because you proactively manage your technical debt. This could make it really easy to justify using SRE principles widely across more or all of your organization.

Continuously Name Unknown Technical Debt

As your unknown debt is discovered, catalog it in your tech debt backlog as quickly as possible. Strive to set a cultural norm that rewards accurately naming and recording previously unknown debt items. The reason for this is so you can control your technical debt’s “interest rate” as something you’re aware of, and can respond to proactively, rather than leaving it as an unknown. For example, if you’ve found that something initially designed as a single thread in a single process is no longer able to keep pace with your system’s growing activity level, and you know you’ll need some method to accelerate it, or add a concurrency solution. This might be previously unknown debt that has now become known, and needs to be cataloged, and considered for refactoring or redesign. After all, nobody enjoys being routinely surprised by unknowns as they become known through service quality degradations or outages.

Naming technical debt may involve routinely reviewing and profiling your code bases for weaknesses you have learned from experience are problematic. For example, you might have sections of your code that are not covered by unit tests, or capabilities that are not covered by integration tests. The absence of effective automated tests should certainly be considered a technical debt backlog item.

Evaluate Your Progress

We’ll want to observe a set of metrics that objectively validate our debt management approach. Our set may include:

  • System reliability metrics (uptime)
  • Feature throughput (burndown)
  • Technical staff attrition rate
  • Customer satisfaction scores
  • Employee satisfaction scores
  • Unscheduled bug report rate

If our downward trending figures reach an inflection point and begin to improve, share that with our teams as evidence that our debt management efforts in concert with our new feature development are working.

Keeping a keen eye on our unscheduled bug report rate will give us an indication of whether our technical debt “interest rate” is under control, or if we’re slipping. If we notice an inflection in the growth of our unscheduled bug report rates, we may be approaching a tech debt bankruptcy, and it may be time to revisit our debt payment percentages with the affected engineering teams. It’s normal to expect unscheduled bug rates to surge upon major feature releases. New code will bring new defects. We’ll be looking for bugs showing up in code that we have not recently released as the signal that something may be wrong from a tech debt perspective, and maybe refactoring efforts should be planned in response.

We’ll also want to check in with our development teams and their leaders to subjectively judge the morale levels on our teams, and how our tech debt management approach is affecting them. Don’t be surprised to hear a bunch of very relieved engineers who are finally given license and mandate to fix something that’s been worrying them for far too long.

Over time, some teams may eliminate their tech debt backlog completely. Wait, don’t celebrate quite so quick. This is actually a source of concern. It’s healthy to maintain a manageable level of technical debt, but if you completely pay all of it off, then you’re probably not being ambitious enough with the pace of delivery of new features. You want to pay off the high interest debt, and disregard debt that has little or no impact. If the technical debt levels are approaching zero, take a moment to judge whether the remaining balances are deliberate, or inadvertent tech debt, as described above. Ideally try to minimize inadvertent debt levels, and carry a sensible level of deliberate tech debt.

Like financial debt, short term and low interest debt can be a powerful growth accelerant. However high interest long term revolving debt should be minimized. If you apply the same fundamental philosophy to deliberate and inadvertent technical debt respectively, and keep your inadvertent debt under tight control, expect great results.

Follow me on: Linked-In, Twitter

Ronald Bradford

Principal Data Architect and Data Strategist - Driving change with actionable insights for customers with data-driven decisions | Author | Speaker | AWS Certified

6 个月

Excellent detail on quantifying and using different mechanisms to be intentional in addressing technical debt. This can directly affect the development, infrastructure management, and maintenance velocity. Another direct factor is a derivative of Uncle Bob's 5-year expression. A lot of technical debt remains unresolved and uncataloged because resources lack sufficient knowledge of the system being maintained regardless of their respective years of experience. Combined with an ever-increasing array of abstraction products, a simple minor dependency update can cause a significant outage, and addressing a technical debt situation under pressure is an added complexity.

回复
Dennis DeMeyere

Spatial AI CPO/CTO. Zero to 20M MAU. Ex Google, Disney, Microsoft.

7 个月

This is spot on, and from our vantage point- reflects reality. Where Adrian, who is always brilliant, is especially insightful is around the use of technical debt as a strategic tool. As with personal debt, not all debt is bad. Egregious and predatory/destructive debt is bad. Debt that costs less than the growth traded for that debt - can be quite good, especially if that growth has a compounding or longtail impact. If you can begin compounding growth faster, and the technical debt can be paid off shortly after compounding begins - that's good technical debt.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了