A Technical Debt Fairy Tale
Once upon a time, there was a lead developer called Annabel. She worked for vakation.com, a travel booking site. She was the tech lead of the devops team that maintained the web-app front-end. Springtime was upon the land, for ‘twas April, a busy time, when people were starting to book their summer holidays. But sales were disappointing. Conversion of visits into actual bookings were getting worse and worse.
One morning Ramon, the team’s business owner, requested an urgent gathering: they had found one of the root causes of the problem that was plaguing them. “Our site does not show visitors whether the accommodations they are booking have facilities to cast streaming media from their phones to the TV in the room,” he said, a frown upon his face. “It turns out that many of our competitors now do show that information, and clients end up booking their travel on other sites. So we need to add the media casting info to the vakation.com website with great speed and utmost alacrity!”
Annabel turned to Edwin, who headed up the hotel reservation back-end system, and asked Edwin whether the media casting info was available in the back-end system. Edwin smiled, for he had good news: the latest hotel booking communication standard included the media casting information! It was already present in their accommodations database. Vakation.com used an Enterprise Service Bus (ESB) to connect the various systems in their landscape, so Annabel asked Edwin to expose the desired info on the ESB for the web-app front-end to display. Seeing Annabel’s look of joyful expectation on her face, Edwin was happy to oblige. But things were not as merry as they seemed – for at that time, Edwin’s team had quite a full backlog: it looked like they wouldn’t have time for the change until June.
Now Annabel started to get worried: fearing Ramon’s wrath, she didn’t want to go back to him and tell him that she wouldn’t be able to fix the website until June. But she soon cheered up, for due to a stroke of fortune, the web-app fornt-end was on the same physical database as the hotel reservation back-end system, and Annabel’s team of diligent developers knew the table structure. So Annabel decided to temporarily ignore the company’s ESB policy and obtain the media casting information directly from the accommodations database, taking on some technical debt. Her mind was firmly set on refactoring the temporary fix as soon as the info was exposed on the ESB, hopefully in June.
Two months went by, after which Edwin perused his backlog, and lo and behold: he spotted the story card with Annabel’s request. With some trepidation in his heart, he visited Annabel in her lair, and asked her: “Fair Annabel, is this story still needed? Some other, more urgent stories have popped up. Would it be terrible if we pushed back the media casting story a few more sprints? After all, things are working now, and nobody is complaining”. Failing to reach agreement, Edwin and Annabel decided to ask Ramon, the product’s business owner. But to their great surprise, Ramon had trouble remembering that the fix was temporary, and was not at all interested in prioritizing the refactoring.
领英推荐
Days, weeks and months passed by. Eventually, it took until November before the info was finally accessible through the ESB. In the meantime, it turned out that some new members of Annabel’s team had been copying the method of accessing data in the hotel reservation system directly. Having run into Annabel’s temporary fix in the code, they felt comfortable using the same method – this time without even asking Edwin’s team. The cursed technical debt had multiplied itself! This had already led to the website breaking down after Edwin’s team had made some changes to their table structure, unaware that it was accessed directly by other teams. These outages had caused Ramon to wax angry and scold both teams: “Why did you allow things to become so bad? You should have managed that better!”
As we close the magical ?book on this horror story, there are some questions we may ask ourselves:
?I look forward to reading your answers!
Disclaimer: all characters and organizations in this story are fictitious, and any resemblance to real persons or companies is purely accidental.
Engineering Leader | Building Strong, Empowered Teams | Equity Advocate | Tackling Backend Complexities & Driving Technical Success
1 年I think Gartner's definition of technological debt would have fit better here, since that is the debt and cost associated to continue doing business in a software-factory/software-dependent environment. The issue are never "just the devs" (as the original definition implied) but every stakeholder with decisions affecting the software. TechDebt is _always_ a business risk, not a developers' problem on its own. Excellent article!
ven though I am from the embedded world it is intresting to read your story. If her team would have had understood the system thinking and it seems they had the time, then they could have supported and created these necessary APIs and published themselves on the ESB. To me it looks like a poor architectural decision to go for a centrally managed ESB in service-oriented landscape.? We are having similar problems in the embedded automotive industry where we are heavily relying upon CAN and LIN communication links, and as of today they require a centrally managed "signal database" where you prewire signals (payload) into link frames. Thus that will always be the bottle neck.?
Senior Full-stack Software Engineer at Relex Solutions
1 年Isn’t the problem lack of visibility that there is an issue here? From the description it seems like people thought that “things are working fine” and were comfortable copying and reusing a solution, so they thought the solution was “right” and it was okay to copy and reuse. So there was no indication, metric, comment, documentation, implementation detail, (etc) that would point to the issue here. If there were, the whole team could have been accountable for it (which I find more powerful than assigning a single responsible person to this).
I love this story! Thank you for posting it. But this is actually not the worst case. At least Annabel knew that she was taking on debt and planned to do something about it. But debt often accumulates without developers even being aware that they are doing anything wrong. After all (says the blithe developer): I can see that class, and I know the method signature, so I can just insert a call to it (never mind that there was an abstraction interface that I should have used). Each of these changes to the source code is innocuous in isolation, but they accumulate until the overall structure is completely eroded. For example, I once reverse engineered the "layered structure" of HDFS. It turns out that the layers were mostly in the minds of a few people, and did not exist in practice. In practice it was a big ball of random connections. That, I think, is where the biggest problems originate.