Managing Technical Debt with Continuous Architecture

Many people love to eat burgers, cakes, and sweets. Tastes good and its okay in moderation. However, the consequences of eating too much of this stuff is that you gain weight, need to diet, go to the gym and exercise.

No one likes to diet. Some choose to eat better to remedy the situation. Others choose to not change the situation at all and eventually suffer serious health consequences.

OK… see where I am going with this yet? If not, you will…

Many organizations like agile software development because we are able to deploy features faster, or at a higher “velocity.” It makes managers happy as well as customers…, provided delivered features work. And well, it’s just a damn good idea…, at least in principle. Just like the anecdote above, we want all the good stuff, but we don’t want to think too much about the consequences. Like Grandma used to say, “Too much of a good thing, isn’t good.”

Blind Agility

“Blind agility” is a situation where an organization churns through sprint after sprint chasing use cases, writing and massaging code, without any understanding of the leviathan they are creating. The “blindness” comes from the fact that there is no overarching, commonly understood, system architecture that is consistent with the code base to guide the development and modification of software. Further, there is no attempt to recover the design, understand the system structure, or repair design short comings in a disciplined way. The evolution of systemic properties and structures is left more-or-less to chance. The design is ignored until it is a problem, then it’s costly or too late to do anything about it.

The underlying consequence of blind agility is that it leads to the rapid accumulation of technical debt. Blind agility is okay early in a project provided there is a tolerance for delivering less-than-perfect software. When systems are newly hatched, everyone can get their head around them, write code, deliver features, and velocity is high. However, with each sprint as code is added, modified, and repaired, the as-built architecture transmogrifies until it no longer matches the initial architectural design (if one was deliberately created). Eventually, no single person in the organization understands the systemic structure. This is magnified as engineers move on and leave the organization.

If these cycles are allowed continue without ever baselining the as-built architecture, the system becomes brittle. Brittleness is manifested in many ways. Maybe it gets slower. Security holes emerge that no one anticipated. It becomes harder to modify and fix the system at each sprint. Relatively simple changes break the system. Velocity shrinks, because more of the backlog volume is comprised of defect repair items rather than new features. These are signs that technical debt accumulation is overtaking velocity.

No alt text provided for this image

Once you reach this point, technical debt can become asymptotic and various systemic qualities such as: performance, reliability, modifiability, scalability, etc., suffer. These systemic properties are hard to repair, because in most cases they can’t be fixed by localized optimizations – a fancy way of saying “individual programming efforts on code modules.” In fact, that mindset caused the problem in the first place. Qualities such as these are dependent upon systemic structures designed to promote them. To repair broken quality attributes requires systemic thinking and refactoring…, a fancy way of saying “redesigning the system.”

Interest on the Debt

Organizations that blindly sprint away will find at some point that it is impossible to pay down technical debt. This is because technical debt comes with interest. Blind agility is a lot like having a high interest credit card and partying like its 1999.

I am not anti-agile. I am a proponent of responsible agility and strongly believe that technical debt is a reality that must be managed – regardless of whether you use agile or not. Just as in life some monetary debt is good because it financially advances an individual or organization in the long term, so it is with technical debt. For example, we buy a house, we live in it, we get to write off the interest payments from our taxes, the home increases in value, and of course we pay our debt off and have an asset. It’s good debt, because in the long term it pays us back many times over. This is a calculated risk where we understand the consequences of assuming the debt.

Assume we spend a little upfront time to create a notional architecture and a product/system roadmap, then we can target some number of sprints to deliver initial operating capabilities. We know we will incur technical debt, but we will learn an immense amount about what customers need, what is feasible, and how we might best structure the system in these early sprints. The lightweight upfront design reduces the risk of creating of an unwieldy systemic Kraken that we have to tame or dispatch earlier than we would like. Win-win! While a little upfront design goes a long way, we still need periodic architecture reviews to manage system evolution and devolution. We need to track technical debt and refactor to synchronize the as-built and documented architectures and align them with a product/system roadmap.

The Consequences

Most organizations I have worked with never realized how much technical debt they had until it was too late. Some of these organizations got to a point where they spent 50%-90% of their engineering resources in every sprint just fixing stuff. They were barely paying down the interest on their technical debt. Why you ask? Because they still spent 10%-50% of their resources each sprint adding new features. Some percentage of these new features adds to the technical debt because of persistent structural problems in the system.

Organizations get to this point because: 1) they don’t know what their system’s architecture is, then 2) they can’t understand where the technical debt is without an architecture, and so 3) there is no hope of managing or paying down the technical debt. It’s overwhelming and the consequences can be pretty ugly…

Crawling Velocities: Sprint and project velocities move at a snail’s pace. First, because it’s increasingly hard to modify the system. Secondly, teams are spending much more time on repair tasks than new feature tasks. What this means to customers is that they have to wait longer for their new features. How anti-agile is that?

Compounded Interest: A convoluted system with a known architecture is one thing. A convoluted system with an unknown architecture is an entirely different problem. The more convoluted and unknown a system’s structure becomes, the harder it is to understand the consequences of localized changes. It becomes more difficult to modify or fix the system without injecting faults. This is like compounded interest on technical debt where more time has to be spent to do less work. It’s like running the air conditioner with the windows open.

Backlogged Defects: Defects get stuck in backlogs for longer periods of time. First, because there are more of them and the compound interest on the technical debt is generating more of them. It also takes longer to find and fix defects. Defects in systemic properties (e g. security, performance…) may never get fixed.

Back to the Drawing Board: In extreme cases it may mean redesigning a new system and undergoing a complete transition. All systems eventually “wear out” and will need to be replaced, but we should be transitioning to new systems when WE are ready, on our terms, and when it makes business sense to do so, not because are being forced to so by raging technical debt. If we don’t manage technical debt, systems wear out much earlier than desired – perhaps even before we have amortized our investment in them.

Death Marches: Having to fix and enhance a system whose structure is like an Escher's Staircase can be demoralizing to the most optimistic among us. The daily grind of frustratingly long lists of defects and shorter deadlines can turn work into death marches and burn-out engineering staff faster than anything else. In a burnt-out demoralized workforce, the best engineers leave first. Those engineers that leave take the knowledge of the architecture with them, compounding the problem of not knowing what the system’s architecture is or being able to easily recover it.

Continuous Architecture

Many organizations say they don’t have time to architect. But they make plenty of time to code, recode, recode, recode…, and did I mention recode? On the flip-side, for too many years the architecture community has been guilty of omphaloskepsis. Architects who sit in their offices and think deep abstract thoughts with little foundation in real system structures – or even in relevant technologies. I have reviewed design documents that had thousands of pages most of which were never read. Some organizations do upfront architecture design, but that is it. They never look at the design again, even though they may be changing the system daily. It becomes shelf-ware that never sees the light of day. This is only a little better than hacking and slashing code until a system pops out of the dirt.

Big-bang architecture doesn’t work any better than big-bang software processes work. Continuous architecture does work. Continuous architecture means…

  • starting with a lightweight design
  • growing the design overtime in a disciplined way
  • continuously reviewing the as-built and documented architecture in order to ensure conformance
  • actively tracking and managing technical debt
  • refactoring the as-built design and/or updating the documented architecture as necessary
  • having a single transparent design that engineers to guide construction

Continuous architecture blends architecture activities into daily development activities so the cost of architecture design is amortized over the entire lifecycle.

Good Design Habits

In principle, obesity is an easy problem to fix. Eat less, eat the right things, and exercise more. But it really isn’t easy as many of us know. That’s because it takes uncommon discipline, prioritization, and commitment to do these three things. The problem of managing technical debt is not hard in principle. However, it requires uncommon discipline, prioritization, and a commitment to adopt principles of design thinking and good design habits. Managers are key. They set the priorities. Good architects are essential. They maintain the designs. Here are some good design habits that are simple to adopt:

Design Transparency: Make the system’s architectural design visible to all. I prefer a lightweight, on-line architecture document that everyone has access to, not just the priestly class. The architecture should be transparent to all engineering staff and its best if it is kept in a wiki style format where all can comment and ask questions. Keeping a FAQ is also helpful for newbies as they learn their place in the system architecture. Keep the documentation simple. Make sure it is at the architectural level of abstraction and not a collection of 100,000 class diagrams. Everyone should know what the business goals are, and the important systemic qualities needed in the system. Include design views from each perspective: static, dynamic structure and behavior, physical, and allocation. Less is more. Avoid wordy documents and focus on consistent views with legends and brief descriptions to accompany the views. No one will read or maintain big wordy documents.

Continuous Architecture Reviews: At the heart of continuous architecture is amortization of architectural actives across the lifecycle, NOT one big architecture party at the beginning of a project. After the party, you clean up, and it's forgotten. Amortized activities are done continuously and provide continuous value. A key continuous activity is architectural reviews. At strategic points, such as sprint retrospectives, allocate time to review the as-built architecture with as-documented architecture. Note deviations and keep them in a technical. debt backlog. The going-in assumption is not that the documented architecture should be the truth model, the goal is to align the as-build and documented architectures. Sometimes code deviates from the architecture designs and its okay - update the design. In other cases, the code may deviate from the architecture and it’s not okay – fix the code. Finally, there are situations where the neither the code nor the design right – fix them both.

Manage Evolution: Evolution of the system should be proactively managed to the greatest extent possible so that technical debt can be planned and managed. This is an essential part of work planning but it is impossible to do if you don’t understand the structure of your system. Because technical decisions are also business decisions, the evolution of the architecture should be aligned with your product or system road map.

Refactoring Sprints: Build refactoring sprints into your process. How often and how many refactoring sprints you need will depend up your situation. Assuming that technical debt is being tracked, when reaches it some critical point, set aside sprints to update the architecture and refactor the system. This doesn’t mean completely redesigning the entire system. It may, but typically it is impossible and risky to completely redesign an entire system, test it, and then deploy it. More often it means remodeling portions of the architecture to pay down technical debt. It should be an ongoing effort for the life of the system. The goal is to repair issues with systemic properties, ease the ability to update and maintain the system, and prepare the system for the addition of future capabilities (not implement them). Make refactoring sprints as much a priority as paying your mortgage. It can be tempting to skip or delay them, and organizations often do. I have worked with organizations that indicated they “planned” to refactor, but never got around to it. It’s a discipline – just do it.

Explicit Technical Debt Management Policy: We should make reasoned decisions about when and what technical debt we are willing to assume, the risk of assuming it, when we will pay it back, and at what cost. This must be baked into their software development processes and organizational culture. This has to be built into project scheduling and costing. It requires leadership and investment. It’s not easy and it’s not free. Refactoring to pay down technical debt can be a daunting task for architects. Managers hate sprints where no new features are delivered. Get over it! It has to be done at some point or technical debt will creep up on you and later it will be many orders of magnitude harder, costlier, and have a much greater negative impact on the organization. Pay now or pay later. Keep drinking, smoking, and skipping the gym…, see you in the emergency room.

Dude, We Are There…

You might be saying, “Damn…, it’s too late…, we are there already.” Take deep breaths…, there are different levels of severity to “being there.” A few years ago, I consulted for an organization that had 400 engineers working on a massive medical records system. Their system was running on a pre .NET COM/DCOM architecture and they could not find the cycles to migrate to a more modern architecture. The volume of code and the pace of operations made a real transition effort all but unthinkable. Upwards of 90% of the engineering effort was spent repairing the system just to keep it running. In another recent case, an organization had nearly 20 million lines of COBOL and 40 million lines of Java. They had a goal to merge the systems and move to a common platform service, but again the volume of code, the hard limit on engineering hours, and the pace of operations made the task overwhelming.

What did these systems have in common? In both cases, these systems evolved over many years and there was no documented architectural design so no one was able to reason about what they had and what it would take to address the issues they faced. Both systems had immense technical debt – one of these systems had a single class with thousands of lines of code. They had reached a point where they could not pay down the technical debt. The triage plan was to begin with a massive architectural reconstruction activity that involved hiring outside consultants to pour through code and recover the architectural design. Ugh! This was a lot like digging ditches. Only then could meaningful steps be planned to redesigned these systems. Unfortunately, once the efforts were complete and the systems were in better shape, both organizations returned to their old habits. They will likely find themselves in a similar situation in the future.

I get it - we must maintain our operational tempo. But we also must make time to take care of the long-term needs of our systems as well as grinding out code. We have to do both. It is the job of management and the senior technical staff (architects) to proactively manage technical debt…, and continuous design is the easiest way to do it. Design is a verb and a noun. It is action and it is substance. It is not a one-time gun shot that starts a coding marathon. Nope. Design is an everyday thing. It is a way of thinking. It is a culture. It should be woven into all the activities that goes into building systems. Design is means to an end. Excellent design. Excellent code. Excellent Systems.

Nina P.

Mom | Mission Driven Engineering Leader | Coach

5 年

Anthony J Lattanze Tonyy, Excellent article. So well written, thorough, insightful, realistic and proposes ideas and solutions around managing and incorporating into the teams and organizations. Love it. Love the use of the word ‘leviathan’ ??. Sometimes it does feel like that when all hell breaks loose and shit hits the fan. I could hear you reciting this to the world in the CMU lecture halls as I was reading it??. Miss Tony lectures!

回复
Bill Francoeur

Distinguished Engineer at Blue Cross and Blue Shield of Illinois, Montana, New Mexico, Oklahoma & Texas

5 年

Great use of “transmogrifies”. Over three orgs in widely different apps I’ve seen this story repeat over and over. The perspective here is spot on, simple concept but very hard to adhere to. Great read, very approachable dissection of a key concept.

回复
Vijay Sai Vadlamudi

Adjunct Faculty Member, Institute for Software Research, School of Computer Science at Carnegie Mellon University

5 年

Drove it home Tony, you’ve dumbed it down, can’t be clearer!?? thank you!!

James Devuyst

Operations Leader at Chamber Media | MSIT

5 年

This is great. Also, I learned an excellent new word:?omphaloskepsis. I'm gonna have to start using it.?

回复

Great article! This is so true.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了