Piled Higher and Deeper? ... Increasing Operational Faults and Technical Debt
Ray Carnes
Transformational Leader, Large-scale Systems Architect and Software Developer, Executive Coach, Author, and Musician
Let’s take a short pause to think about the amount the operational and technical problems your team (your enterprise) is accumulating. I’m going to posit that you’re generating “problems” at a much greater rate than you are solving them.
???This should be alarming, but these problems (operational faults and technical debt) build up gradually over time. And unless we are looking at the broader context of our production baseline, we’ll probably even be comfortable with the mounting unknown risk within our production operations.
???Let’s use an example to show what’s going on in the production baseline of most software enterprises.
1. A production model
1.1 Each team
???Say that each team works with two-week sprints and produces one feature per sprint. Also, let’s say the average feature has ten stories and the average story has ten test points. So, the annual output of the team looks like this.
???Further let’s say that features from this team will be combined with features from other teams to form capabilities. (Note, please update this model if your experiences are different.)
1.2 An enterprise
???Now, let’s say we’re working in an enterprise with 100 development teams. So, now the combined output of the enterprise is as follows.
???All this effort goes to produce 260 end-to-end capabilities per year (with the assumption that each capability is composed of an average of ten features.) ?Impressive.
2. A defect model
2.1 Defect definition
??We’ll use a definition of defect severity that’s pretty standard across the software industry. It has five levels of defects, each with varying severity of operational and developmental impact.
2.2 Defect generation per capability
???Working with the defect model above, we’ll propose a conservative model for the number of defects generated per capability during each development.?(See if you agree, if not tune it up or down.)
领英推荐
2.3 Defect generation per year
???Now, multiply the defect numbers per capability (above) by the number of enterprise capabilities (260), we get the following model for problems/defects generated in development for the enterprise each year.????
3. Compounding problems
???The magnitude of the defects described above is just the first-order model of complexity within our system of capabilities. As multiple defects accumulate at one level, they can cause higher level problems at other defect levels within the system. We’ll model this second-order behavior as follows.
???And interestingly, defects introduced by combining of lower-level effects within a system can be the some of the most difficult to test, detect, isolate, and repair.
4. What gets fixed… and what doesn’t
???Clearly, based on the definitions above, an enterprise doesn’t release software with sev-1 or sev-2 defects, and rarely (temporarily) releases with sev-3 defects. The expectation is that we’ll catch these faults in the test process and correct them before release.
???But some sev-1/2/3’s and most sev-4/5’s are not tested and therefore not corrected/mitigated during development. And they will all move into the production baseline. Of course, the more critical faults will be detected during operations and be corrected. ?Still others will remain dormant waiting to be combined with new capabilities (and problems) in the next release.
5. What’s happening over time
???All this brings us to the discussion of what’s happening to the enterprise production baseline over time. The figure above shows this effect over a five-year period.
???The graph depicts the number of problems degenerated as compared to the number that are corrected or mitigated. Note the order of magnitude difference between these two curves and that the distance between them is opening (not closing) over time.
6. Leadership Lessons
???As a leader in this situation, your take-home points are these:
???It’s essential that you model your enterprise and understand your risk… and your risk tolerance.?What are the trade-offs you can make to improve first-time quality and avoid costly operational failures?
??Only you can determine if you have a steaming pile of non-conformance… or a pending enterprise baseline collapse.
--------------
Thanks for reading. If you found this helpful in any way, please ‘share’ with your network.