The 'Defect'? Resolution

The 'Defect' Resolution

Earlier this week, I was speaking with one of my partners who is in the final stage of delivery for a large system at a public sector client. By final stage, I mean the testing phase of the program where configured/developed software is verified against requirements via test cases to confirm conformance with user needs.

Before jumping into the specifics around testing, one quick item that often get caught up in the debate is the concept of “out of the box” (OOTB) configuration and how OOTB speeds implementation and minimises the occurrence of defects. OOTB is a term that refers to installing a software package with little or no programming (configuration) of the product being installed. The truth is, there is really no such thing as OOTB configuration of a product such as Microsoft Dynamics or Salesforce. Sure, there are design and configuration decisions that can minimise the level of customisation required but even if it were possible to install a software product with little or no configuration (hint – it’s not possible), the integration of the solution to existing data sources would still require programming / configuration anyway. 

The inconvenient truth is that OOTB is a sales tool that vendors used to persuade buyers of the ease of software installation. Furthermore, its relationship to the level of defects that will arise from an implementation effort is minimal, at best. If interested, you can explore a more pointed view on OOTB here.    

Regardless of whether your implementation is fully bespoke or of the mythical OOTB variety, executives and board members will ultimately face a moment of truth on the implementation where they must decipher the various reports around the number of defects on the project at any given time. Then the hard part – what does this mean for the go-live? How many defects should one reasonably expect? Surely all defects are not the same – which ones should we concern ourselves with? Does the program have a more systemic quality problem that the governance board should be concerned with? Most importantly, how long will it take to resolve the defects so that we can go live on time? 

Let’s have a look at these, one point at a time. 

How Many Defects are too Many?

This is probably the most common theme that one hears in the context of projects & testing and unfortunately, there is no simple answer to the question. Considerations such as the length of development, the number and complexity of integrations to external systems and the amount of custom development v. configuration all play a role in the overall volume of defects one can expect to see from a system implementation. That said, there are a few myths that can be dispelled immediately. 

Firstly, having very few defects is not necessarily a good thing. In fact, having a very small number of reported defects, particularly at the start of testing, is likely a red flag that your testing team is either hiding things from you or not exposing the full extent of the actual defects. In a perfect world, defects are a result of the mis-alignment between real-world business needs and what your business analysts (BAs) have captured in the form of requirement documents (perfect world = competent developers and quality unit testing). This translation of real-world business needs to written requirements is seldom perfect and therefore all projects can reasonably expect a number of defects to arise during the user testing phase. In his book “Code Complete”, Steve McConnell suggests 15-20 defects per 1000 lines of code. Whilst this is a general guideline, it does give a sense of the proportion of defects per line of code. Extrapolating this out, two hundred thousand line of code (not an extraordinary amount for a typical project) would result in 2000 defects. Add in the fact that your developers are human (e.g. – they can make mistakes on par with the BAs who mis-code requirements) and you should expect to see a healthy number of defects arising throughout testing.

A second and equally important consideration around defects is the notion that all “defects are not created equal”. A spelling error on a label of a user input field is not nearly as critical to rectify as a miscalculated refund amount for an end customer. That’s where the concept of severity comes in.  As testers are capturing the defects, they should also be categorising the defects in accordance with a pre-defined severity categorisation scheme. Something that states that “spelling errors are not as important as data errors”. 

Side note - the subject of severity definitions could consume an entire blog itself. The tricks and games played in the categorisation of defects are as old as software development itself. My personal favourite was when a program manager working for an insurance client of mine decided to re-categorise all 200+ remaining defects in the backlog to Severity 1 (most critical) to delay go-live and shift the responsibility of defect resolution onto our team. Imagine his surprise when the tracking logs revealed his attempt… (you know who you are mate!) 

The key takeaway from both points is that you will have hundreds of defects in a successful implementation project and that it is important to have a fully defined categorisation scheme in place, before testing begins, as well as a strict adherence to this scheme throughout testing to provide as much transparency as possible as to the actual state of defect resolution. 

Given that one can reasonably expect hundreds of defects across several severity categories (ranging from critical functional defects to trivial cosmetic defects), is the number of outstanding defects sufficient to tell us how long it will take to resolve all the defects?

Little’s Law

Borrowing from our friends in the operations science, Little’s law is a simple formula that relates a systems’ inputs to the systems’ outputs. Originally used to describe problems in queuing theory, it states that the long term-average number of customers “L” in a stationary system is equal to the long-term average arrival rate “λ “ multiplied by the average time “W” that a customer spends in a system. 

In algebraic terms, little’s law looks like this:

L = λ*W

By applying similar logic to the challenge of defect resolution, “λ” takes the role of the number of defects being raised per unit time, “W” takes the role of the average time to resolve a defect and “L” takes the role of the number of outstanding defects at any given time. Using this simple formula, we can predict how long it will take to bring the backlog down to zero defects. For example, if we have 375 outstanding defects and the development team are resolving 25 defects a day and the test team is raising 10 defects per day, it means our we are resolving 15 defects per day (25 – 10). Therefore, it will take 25 days (375/15) to resolve the backlog at the current ‘flow rate’ of defects.  

Some important implications fall out of this simple formula. First, the most important consideration around the defect pool is how many are being raised per unit time (e.g. – per day) and how many are being resolved per unit time. Cleary you must be resolving more than you are raising per unit time or you will experience a never-ending growth of the defect backlog. 

Counterintuitively, this is likely to be the case in the early days of the testing phase (defects raised exceeding defects resolved) but as testing continues, the initial surge of defects normally subsides and the balance shifts toward resolving more defects than are being raised.

Secondly, the project’s ability to both identify and resolve defects is a critical consideration in getting through the testing phase. On the ‘identification’ side of the equation, this means your testing function. Both the number of people testing as well as quality of the test process are important considerations. Often clients insist on running testing themselves as a way to ensure independence between the development team (read: solution vendor) and the end user (read: client team). This can work if the client has the experience in the space but often this is not the case which results in a decrease in the rate of defects being raised (“λ” in our equation), conflict between testing and development, or both.   

On the ‘resolution’ side of the equation, this means the development team members dedicated to resolving defects. Most projects experience some sort of schedule slippage that results in ongoing development being done in parallel with the test phase of the program. No self-respecting project designs this approach in at the onset, but it often happens as schedules tighten and resources become scarce. Again - great if you can make it work but this may result in the development team being split between ‘proper’ development and defect resolution which can ultimately decrease the rate at which defects are being resolved (“W” in our equation).

What Really Matters?

When the shocking number of 1000+ defects is shared in the board meeting, it’s natural for the focus to go to that headline number.  Whilst an important input in and of itself, the headline number does not tell the tale of what’s actually happening on the project. 

As described above, the more important consideration includes the number of defects being raised each day in testing, the number of defects being resolved each day by developers as well as the severity of the defects that are being raised. And this information cannot be shared unless these statistics are being measured and reported in an accurate fashion – something that often is not the case when you have a vested-interests (e.g. - client trying to pressure a vendor) controlling the distribution of the data.

Netflix recently premiered a documentary on the NASA space program and the Challenger explosion that occurred in early 1986. Spoiler alert – the test data collected by the engineers on the program clearly indicated the dangers of launching in cold temperature but the desire to stay on track to a very aggressive launch schedule caused management to ignore what the data was telling them and launch anyway.  

NASA ended up re-designing the solid fuel rocket boosters after the fact but at a cost to the program that was much greater than had they simply delayed the launch and listened to what the data was telling them. 

Make sure you project doesn’t blow up just before reaching orbit.

Jeffery Eberwein is a senior partner at EY in the Advisory practice focused on digital transformation and its implications on business. He can be contacted at [email protected]

Couldn't agree more about OOTB... Also great to see a focus on testing as defect numbers can so often be seen as scary when the real picture is very different. ??

回复
Michael Loke

Head of Asset Management Platforms

3 年

Great write up de-mystifying the overplayed "OOTB" functionality, particular with larger ERP/CRM type programs. Seen it burn organisations many many times.....

Kurt Solarte

Oceania Tech Consulting Leader at EY

3 年

Nice write up, do I get royalties for this?

要查看或添加评论,请登录

Jeffery Eberwein的更多文章

  • AI vs. the Humble Spreadsheet

    AI vs. the Humble Spreadsheet

    The ‘exclusive’ luncheon offered by Microsoft seemed like a good opportunity to learn more about the future direction…

    4 条评论
  • What would ChatGPT make of the OpenAI Fiasco?

    What would ChatGPT make of the OpenAI Fiasco?

    One wonders what ChatGPT would return in response to a question regarding the situation that transpired over the last…

  • The Democratisation of Data

    The Democratisation of Data

    Unless you’re living under a rock, it is nearly impossible to miss signs of the data revolution that is happening all…

    5 条评论
  • The Collapse

    The Collapse

    During my EMBA, we had a class on Business Law. The dean of the program taught the law class and used a fascinating…

    5 条评论
  • The Andersen Story

    The Andersen Story

    My career didn’t start in the field of 'consulting'. Both my first job out of Uni which lasted about a year and then my…

    3 条评论
  • The Enron Story

    The Enron Story

    As the saga behind Enron bankruptcy started to unfold in the summer and fall of 2001, Enron was a paradox. A ‘New…

  • Enron and Arthur Andersen - 20 Years Later

    Enron and Arthur Andersen - 20 Years Later

    Exactly twenty years ago this weekend, I was in New Orleans for a milestone Global Partners meeting with Arthur…

    11 条评论
  • How Projects Fail

    How Projects Fail

    “Happy families are all alike; every unhappy family is unhappy in its own way." Many would recognise the famous first…

    7 条评论
  • WTF are NFTs?

    WTF are NFTs?

    Over the recent past, Non-Fungible Tokens (NFTs) are popping up everywhere. The first time I heard about NFTs was in…

    4 条评论
  • "Hey Google - What's this Dispute About"

    "Hey Google - What's this Dispute About"

    Unless you've been completely distracted with the politics of the US this last week, you may have seen something in the…

    11 条评论

社区洞察

其他会员也浏览了