Tech Mist - Understanding technical debt (for the non-technical)
Introduction
Non-technical project managers, stake-holders or team members, this one's for you. Developers who have struggled with the battle of prioritising tech debt, hopefully you can find something helpful in here too. I'm going to attempt to reframe the problem of mounting technical debt (tech debt) as a failure of communication rather than a burdensome symptom of software development. Of course the tech debt itself is firmly within the scope of software development but with tech debt, the dose makes the poison; a small amount is not a problem. The solution needs to be where the problem is: its unbridled accumulation. Why is this a communication problem? Because those who understand the nature of tech debt are often not in control of budget or time allocation and those in charge of budget and time allocation often do not understand (and thus are unable to justify the allocation of resources to) tech debt.
The root of the problem
Technical debt is accumulation of messy code that builds over time in a software project until an entire section (or sections) needs to be either rewritten or intensively untangled. This results in a large expenditure of effort for little or no business value. While tech debt can be challenging for software developers to untangle, it also poses a communication challenge: communicating the nature of (and importance of addressing) tech debt to non-technical people. Such non-technical people are often stakeholders or managers who control budget or timelines. Convincing them to allocate time or resources to tasks that yield no immediate business value (except a vague promise of "things will move faster") can be incredibly difficult. This – combined with the fact that neglected tech debt often compounds the complexity of the problem (and therefore increases the time and budget allocations required to address it) – is what informs my view that the source of most tech debt is poor communication rather than poor software development or poor management. Management staff blaming technical staff for accumulating tech debt in the first place makes as little sense as technical staff blaming management for not allocating the time needed to control it.
Types of tech debt
Why would competent developers write code so bad that it needs to be rewritten from scratch?
If only we knew then what we know now (inexperience)
Not to be mistaken for the utilization of inexperienced developers!
Caused by: developers working in the periphery of their sphere of knowledge; a space where mistakes are likely because better options are not currently known.
Most developers are working on the very edge of the scope of their knowledge. This is not because of an inherent love of learning or internal drive for growth (although most developers do enjoy that aspect of it) but rather because there is usually little reason to write code or create functionality that already exists. Apart from basic or fundamental functions, code should be written to extend the world's stock of software capability, not duplicate existing functionality. For this reason, developers should usually be working on unfamiliar problems using familiar tools. As common experience would lead one to expect, this means that the very act of writing code often helps developers identify a better way to solve the problem than their initial, workmanlike approach. Whether to implement this new-found better way is often contentious since a working solution is immediately available. Making the case to spend more time on a task without adding immediate business value is the core of the challenge in managing tech debt.
How to mitigate:
Writing "software spikes" is a common way of allowing developers to test out the feasibility of a solution and to gather some of the learnings that come with writing a first draft. A spike is basically a time-boxed quick-and-dirty prototype that only needs to work as a proof of concept or expose a fatal flaw in the planning or design of an idea. Usually spikes are not allowed to be included into the main codebase as a matter of policy, thus avoiding the temptation to waste time focusing on code quality rather than pure functionality. If the spike is successful, a better crafted second draft can be implemented. If the spike fails, the idea is reexamined from the ground up.
Experienced developers generally have a good sense of the quality of their work. Of course developers want to create pristine, high quality code at all times, but compromises are required where time or budgetary pressure is a factor (which is pretty much always). Allowing developers to grade their own work and setting a threshold for additional clean-up time afforded to the developer will not only help cultivate a beneficial culture of candor within the team, it will also prevent the wastage of lessons learned from first drafts.
It just needed to get over the line (expediency)
Caused by: a rush to market, a sudden pivot, or any other reason that might cause pressures from low resources and short timelines to compound one another.
This is the most common type of tech debt in projects that have been rushed to an MVP (minimum viable product). If it's raining and you need shelter, sitting under a tarp will do until it subsides. If you really need to get some sleep and the rain hasn't stopped, you might need a tent. If you then become hungry, you're going to go find food. When you get cold you might gather firewood and build a fire. If you're lucky you might find ways to improve your methods when your needs are fulfilled – but you're not likely to be able to focus on building a sturdy long-term dwelling when there is an imminent risk of going hungry or catching a cold. The very nature of juggling competing interests under time pressure often imposes limits on the extent to which your methods for fulfilling those needs can be improved. For this reason, it is not uncommon for very old, expedient solutions from an early chapter in a software project's development to lurk in the background years later. Given that it's old, it's highly likely that it's also foundational. If, when you finally get a bit of time to improve your shelter, you start piling bricks up around your tent and then use the tent poles to support the tin roof, you've entrenched an old, fragile, expedient solution into your improved one. No doubt the reason you did this was, as before, unrelenting time pressure. The more layers there are built on top of creaking, decrepit (yet expedient) solutions, the harder they are to extricate and improve on. You'll never be able to add tiles to that roof unless you're prepared to spend a few days risking rain after you remove the old system.
How to mitigate:
More time or more resources. If these are unavailable, you can't mitigate tech debt; all you can do is try to control the damage. This isn't necessarily a bad thing. Young businesses often run on monetary debt for the same reason that young systems are running on tech debt. In both cases, however, you need to pay it back with interest.
It was the right solution at the time (outdated)
Caused by: an inability to accurately predict the future.
When building a house the first thing you do is lay a foundation. A concrete slab is fine for a basic one or two story house. As your project (or house) grows, that perfectly laid foundation becomes a source of trouble; it's no longer enough to support your soon-to-be five story building but it's required to hold up the shaking four stories that you've already put on it. This predicament is not from bad craftsmanship or even poor planning - there could be a plethora or reasons why it would have been unacceptably high-risk to build a deep, five-story supporting foundation when you were starting out. It's just that yesterday's perfect solution is today's obstacle, which means that time and effort needs to go into replacing that slab with something more appropriate. It's also important to remember that planning for the future is very difficult when a product is in its infancy. Even if we were given extra time and resources to build a more robust slab for our house, how are we to know that the best way to accommodate the future expansion is with more levels? Maybe we would have used that extra concrete to lay a sprawling slab to accommodate a large warehouse style arrangement, or maybe a deeper hole would have been dug and the concrete used to make a large underground bunker. Predicting the future is much easier when multiple points of reference can be used to map a trajectory; our single-story house becoming a five-story house is a pretty clear indicator that we should either lay a foundation that can accommodate ten stories or make it so that modifying the foundation in the future will be an easier task.
How to mitigate:
Have a plan. Knowing what is planned for the future is incredibly useful for predicting how current tasks might be extended. If you're ordering windows for your house and you need to choose a provider, knowledge of future plans (e.g., building a greenhouse in the backyard, wanting a thick glass fence around a future pool or a need for double-glazed frosted glass on a future upper level) will most likely inform which glazier you use. Similarly, knowing how a feature is likely to be extended or which other features it will need to integrate with will inform architectural decisions and help ensure that the most extensible solution is baked in from the start, prolonging the usefulness of the first draft.
It was fine before we added all the other stuff (code rot)
Caused by: stretching the purpose of something too far beyond its original remit.
Making too many improvements to something can change it into something it was never imagined to be. Active codebases need to be continually modified to add new features or maintain existing ones – which means perfect microcosms of functionality necessarily become corrupted by imperfect (albeit working) additions that the original creator hadn't intended. Imagine you're fitting a front door to your house; a standard, solid wood door with a basic door knob is perfect for the job. Next task: we need to be able to see through the door. No worries - drill a hole in the middle of the door at about eye-height with a spade-bit and install a peep-hole. Next task: we need to amplify the knocking of guests. Fine - screw on a door-knocker. Next: we need filtered light to come in through the day. Well ok, we can install a pane of frosted glass in the upper half of the door. We can move the knocker down so it's a bit lower but the peep hole will now have to go through the frosted pane. A bit weird, not perfect, but everything still works. Uh oh, we just found out that sometimes we need it to stay dark. Ok - attach a blind to the top of the door on the inside. Draw it down and cut a hole in it where the peep-hole is to make sure that still works. Now the BAs are saying the peep-hole needs to be telescopic. Well ok - attach a brass protuberance to the peep-hole. The blind now catches on it but we can solve that by cutting a slit from the bottom of the blind to the hole we cut for the peep-hole in the last task. Oh, they want a Venetian style blind? That won't work - the slats will hang by the sides of the door when the slit is cut for the telescopic peep-hole.
That scenario might sound insane – but this is also how code rot issues often sound. At this point, we can satisfy all requirements with a frosted plate-glass door sporting a venetian blind, a security camera with a zoom lens and a doorbell. While most people would agree that the latter solutions were clear indications that there was something fundamentally wrong, pointing at the exact step that should have prompted the rebuild of the whole door is a tricky thing, especially when high urgency is added to the mix of considerations. This is also true of a vast majority of software development projects. Usually the warning signs are ignored until a task becomes impossible or something breaks.
As multiple correct solutions pile up on top of one another, they need to become less and less perfect so as to not fatally interfere with the features they closely coexist with. Think of it as a troop of clowns piling into a car - the first one hops in with ease; the last one twists, contorts and forces himself in while everyone else is crushed and unable to move.
How to mitigate:
Code rot is a fact of life. Even perfectly designed code is not immune. If something is perfect for its task, any change to it or to its purpose will, perversely, make it worse. Since part of the purpose of most code is to be extended, code is primed to become increasingly imperfect as it is updated and improved. Good architecture and eternal vigilance are the only real bulwarks against code rot and even though they won't prevent it happening, they will ensure that code rot is smaller and easier to identify and extricate.
How it compounds
Compounding tech debt is another common problem. Even the tidiest and best engineered projects usually have a section that nobody wants to touch. It's often an old section that's too monolithic to replace but too limited to do everything it needs to do. Sometimes, in extreme cases, there will be other classes or sections of code that translate other tasks into something it can handle. Its quirks or bugs are now expected inputs or outputs, so fixing them might actually introduce risk. There is a sense in which this isn't such a bad thing. If the project and the code itself is running well enough that this trouble section isn't getting in the way, maybe it's just a small mark on an otherwise very tidy project. It's likely that a new feature can be conceived of, estimated, developed and delivered without the trouble area being touched. This is probably also why it's constantly deprioritised - it's not in the way, and better things can be done with the time it would take to tidy it up.
The problem is when multiple layers or many common sections are brittle, buggy, inconsistent or incomplete. This kind of situation calls for addressing the problem urgently and imposing some standards. If you're needing to avoid portions of code in the solution you're devising to avoid other portions of code, you have a systemic problem that needs an urgent tech-debt focus. Code doesn't generally rot to this degree, so this kind of situation is almost always a result of prior expediency.
Conclusion...
...for developers
Tech debt is part of life but managing it as your work should be considered part of your role if everything is running as it should be. If externalities like time pressure or lack of a clear plan are causing you or your team to accumulate tech debt, make sure you communicate this to your project manager (or equivalent) so that they can factor the additional risk into their assessments. Also ensure that your project manager knows why the tech debt has accumulated and what the path forward is to fixing it.
...for project managers (or other non-technical roles)
Tech debt is usually accumulated knowingly but needs to be paid off with interest (compound interest at that). A tech-debt-laden system is generally also slow to work in, adding extra cost to ignoring it. If your team is complaining about tech debt, don't expect it to solve itself over time without recalibrating some of the pressures on your team. Accumulating tech debt is sometimes part of the cost of doing business, but make sure you do it intentionally and have some plan for when it will be repaid. The longer you leave it, the more entrenched it will become. Blaming the developers will neither fix it nor prevent its further accumulation. Encourage your developers to record the accumulation of tech debt so that you can make informed decisions about when to address it.
Engineering Manager at Deputy
3 年Agree that communication is key. Most businesses will need to be pragmatic about speed vs quality at some point, and for that reason I prefer the term Product Debt, since it reasonates more easily with non tech peers..
Jared M. I'm always in awe of your writing capability. This a fantastic and straight to the point description of the problem, the variants, and different suggested solutions! Must have reading for all junior software engineers, as well as non-technical project staff. Hope this is the first chapter of many, bridging the gap between engineers and other business folk. You should write a book! I would buy it.