Seven unspoken truths about Software Tests

Seven unspoken truths about Software Tests

Things that are usually not taught at universities or courses, but everyone should know

This is another cross-post from my blog, and one of the most popular posts: https://blog.pplupo.com/2021-12-06-Seven-unspoken-truths-about-Software-Tests/

  1. When in charge of tests, were you ever questioned why you didn’t catch a specific defect? Have you ever blamed someone for not catching a defect?
  2. Have you increased your test coverage only to discover that the number of defects in production is about the same?
  3. Have you spent more time than ever testing before releasing only to find that you couldn’t catch anything else (and as soon as you released, the defects started to come in)?
  4. Can developers test their code?
  5. Will defects will have exponentially increased costs if caught later?
  6. Can you try to find defects by tweaking the software in any way you want and call it “exploratory testing”?
  7. Do you need to employ QA activities to improve the quality of the product?

Let’s see if we can bust some myths around software testing!

1. Tests won’t catch everything!

That is right. No QA activity will catch all existing defects.

No alt text provided for this image
Techniques Effectiveness
Software review: 25% to 40%
Software inspection: 45% to 65%
Code review: 20% to 35%
Code inspection: 45% to 70%

Unit test: 15% to 50%
Integration test: 25% to 40%
System test: 25% to 55%
Beta test (< 10 users): 24% to 40%
Beta test (> 1000 users): 65% to 85%

Adapted from Caper Jones, Software defect-removal efficiency, IEEE Computer, April 1996, pp. 94 - 95, DOI 10.1109/2.488361, ISSN 1558-0814.        

Some testing practices are actually less efficient than just inspecting the code. The point is not to choose what type of test you need to focus on but to combine some of them to yield better results with less effort.

So, next time anyone complains that a defect wasn’t caught, remind them that there is no way to ensure that a specific defect wasn’t caught.

Finding out why a defect was missed from a test is ex post facto analysis. It’s backward thinking. It’s the equivalent of saying that a magic trick is obvious after it was already disclosed how it was done. It’s not a valid analysis.

Never blame a QA Engineer. They are there to find the defects that were introduced. They are not causing them. The effectiveness of the tools to catch them is as flawed as the effectiveness of preventing the defects from being introduced in the first place. Nothing is perfect.

2. Test coverage has little (if any) correlation to the effectiveness of?testing

No alt text provided for this image

Yes, you read that right. We already have enough scientific evidence to say that increasing unit test coverage may not necessarily increase your test suite effectiveness in finding defects! Maybe it’s time to focus on what is relevant to test rather than how much code is being tested.

The references below come from:

V. Antinyan and M. Staron, “Mythical Unit Test Coverage,” 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2019, pp. 267–268, DOI: 10.1109/ICSE-SEIP.2019.00038.

No alt text provided for this image

  1. A. Mockus, N. Nagappan, and T.T. Dinh-Trong, “Test Coverage and Post-verification Defects: A Multiple Case Study,” Proc. 3rd Int’l Symp. Empirical Software Eng. and Measurement (ESEM 09), 2009, pp. 291–301.
  2. The correlation between coverage and defects was?none or very weak. Moreover, the effort required to increase the coverage from a certain level to 100% increased exponentially.
  3. M.R. Lyu, J. Horgan, and S. London, “A Coverage Analysis Tool for the Effectiveness of Software Testing,” IEEE Trans. Reliability, vol. 43, no. 4, 1994, pp. 527–535.
  4. Qualitative analysis found?no association?between the defects and coverage.
  5. B. Smith and L.A. Williams, A Survey on Code Coverage as a Stopping Criterion for Unit Testing, tech. report TR-2008–22, Dept. of Computer Science, North Carolina State Univ., 2008, pp. 1–6.
  6. The results?did not support the hypothesis of a causal dependency?between test coverage and the number of defects when testing intensity was controlled for.
  7. L. Briand and D. Pfahl, “Using Simulation for Assessing the Real Impact of Test Coverage on Defect Coverage,” Proc. 10th Int’l Symp. Software Reliability Eng., 1999, pp. 148–157.
  8. The results?did not support the hypothesis of a causal dependency?between test coverage and the number of defects when testing intensity was controlled for.
  9. P.S. Kochhar, F. Thung, and D. Lo, “Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in Large Systems,” Proc. IEEE 22nd Int’l Conf. Software Analysis, Evolution, and Reengineering (SANER 15), 2015, pp. 560–564.
  10. A?moderate to strong correlation was found?between coverage and defects. However, the?coverage was manipulated and calculated manually.
  11. L. Inozemtseva and R. Holmes, “Coverage Is Not Strongly Correlated with Test Suite Effectiveness,” Proc. 36th Int’l Conf. Software Eng. (ICSE 14), 2014, pp. 435–445.
  12. A?weak to moderate correlation?was found between coverage and defects. The type of coverage did not have an impact on the results.
  13. X. Cai and M.R. Lyu, “The Effect of Code Coverage on Fault Detection under Different Testing Profiles,” ACM SIGSOFT Software Eng. Notes, vol. 30, no. 4, 2005, pp. 1–7.
  14. A?moderate correlation?was found between coverage and defects, but the?defects were artificially introduced. The correlation was different for different testing profiles.
  15. G. Gay et al., “The Risks of Coverage-Directed Test Case Generation,” IEEE Trans. Software Eng., vol. 41, no. 8, 2015, pp. 803–819.
  16. Coverage measures were?weak indicators for test suite adequacy.?High coverage did not necessarily mean effective testing.

3. Testing effort increases exponentially

Many sources state that a tester will find more defects at the beginning of the test activities and fewer at the end. There are indications that the effort to increase coverage and to execute the tests increase exponentially to find the next defect.

In the paper “Test Coverage and Post-verification Defects: A Multiple Case Study,” (A. Mockus, N. Nagappan, and T.T. Dinh-Trong, Proc. 3rd Int’l Symp. Empirical Software Eng. and Measurement (ESEM 09), 2009, pp. 291–301.), the effort required to increase the coverage from a certain level to 100% increased exponentially.

No alt text provided for this image

According to the authors in the book “Implementing automated software testing: How to save time and lower costs while raising quality.” (Dustin, E., Garrett, T., & Gauf, B. (2009). Pearson Education.), software reliability models show that the number of defects found per unit of time decrease exponentially as more time is invested in testing.

4. Developer bias

No alt text provided for this image

Let’s say a developer understood a requirement wrong. The code will be implemented according to his misunderstanding — and so will the test.

If a developer forgets to do something in the code, like verifying a particular condition, the chances are that he won’t remember to test it either.

It’s as simple as that.

To avoid this issue, developers can test each other’s code, but not their own.

They could test their own code if they did not design the test cases, avoiding the aforementioned biases.

While Test Driven Development may reduce the bias of forgetting about something, it won’t reduce the bias of misunderstanding something.

5. Defects caught later may not cost much more to be?fixed

I don’t even know how to start on this one because while it is true, it’s not what people usually say about it. Many of us are used to seeing pictures like these:

No alt text provided for this image

The only difference is the actual numbers, which sometimes top at 30x, 100x, or even 150x. Laurent Bossavit, an Agile methodology expert and technical advisor at software consultancy CodeWorks in Paris, has a?post on GitHub?called “Degrees of intellectual dishonesty” about how this information was apparently created out of thin air.

In the paper “Are delayed issues harder to resolve? Revisiting cost-to-fix of defects throughout the lifecycle” (Menzies, T., Nichols, W., Shull, F. et al. Empir Software Eng 22, 1903–1935 (2017)?https://doi.org/10.1007/s10664-016-9469-x), authors have found NO evidence that the effort to fix a defect in the code takes longer after it goes into production.

In the paper “What We Have Learned About Fighting Defects” (Forrest Shull, Vic Basili, Barry Boehm, et al., Proceedings of the 8th International Symposium on Software Metrics (METRICS ‘02). IEEE Computer Society, USA, 249. 2002.) the authors identified that the cost of fixing certain non-critical classes of defects was almost constant across lifecycle phases (1.2 hours on average early in the project, versus 1.5 hours late in the project).

However,

Many of these studies measure the effort spent localizing the fault and fixing the defect in the code.

What do they miss?

No alt text provided for this image

  1. Regression tests!
  2. Before we go into production, we execute a lot of regression tests. Often, manual testing is involved. Delivering certain defect fixes, especially critical ones, may require many tests to be re-executed.
  3. Cost of opportunity!
  4. At the time these defects are identified and fixed, many people will have moved on to the next tasks or even other projects. These tasks and projects will suffer from interruptions that may even put their deadlines in jeopardy.
  5. Cost to the business!
  6. That’s right! The business may have to pay penalties. The clients may be financially impacted. There will be an impact on the user experience.
  7. In December 2020, the game?Cyberpunk 2077 was removed from Sony’s store?due to the many technical issues at launch. Sony offered full refunds. Later, the developer company CD Projekt Red announced refunds for PS4 and Xbox players.?During an investor call, CD Projekt Red stated that “the cost of Cyberpunk 2077 fixes is “irrelevant” compared to restoring company reputation.” The company’s stock has gone?from?31ashareinDecember2020to
  8. 10 a share in June 2021.
  9. Defects that were not introduced in the code.
  10. If a defect is injected into the code, it only needs to be fixed there. However, imagine if a defect was injected in a technical specification. The ramifications could impact multiple classes in different services or components. Imagine, for instance, the cost of deciding on an authentication framework that is not supported by a third-party service that won’t be able to validate an authentication token. If the defect is in a requirement, the cascading effect can be even worse.
  11. All that without mentioning that if a fault is found in the code, it is expected that at least an investigation will happen to see if it was the technical specification that was wrong in the first place. And if the defect is there, the investigation continues to see if it came from the requirements.

So, while the effort of fixing a code fault may not increase that much after releasing, fixing defects earlier can save a lot of effort, money, and headaches.

6. Exploratory testing requires process and documentation

No alt text provided for this image

Many people think that if they go around trying to input unexpected data or execute actions out of sequence, at random, they are doing “exploratory testing”. They are not.

Exploratory testing doesn’t mean doing things ad hoc. Exploratory testing simply means that learning how the system works happens in parallel to defining and executing test cases.

In other words, exploratory testing can (and preferably should) be supported by existing documentation, such as requirements and manuals. The difference here is that the tests are not pre-scripted.

Testing scripts should be defined as part of the activity so that once a defect is spotted, the way to replicate it is documented. These scripts can later be automated or used in future manual tests (that won’t be exploratory anymore).

Test cases still should be defined using techniques such as Boundary-Value Analysis, Equivalence Class Partitioning, etc. There’s no reason to define random test cases that may not be cost-efficient or effective in detecting defects.

7. Improving non-QA activities in your process can improve your product’s quality

A 2009?study in Brazil?(in Portuguese) involving 135 software development organizations had their capacity to identify and fix defects increased by improving their processes. These companies were part of a Brazilian software process improvement program called “MPS.Br,” where they should adhere to a software process improvement model (the MPS Model).

This model has stages, and 58 of these companies were in the first stage, where they were required to improve their Project Management and Requirements Management processes.

While it’s unclear why this happened, we can reasonably expect that projects that identify the right people to participate in the team, training needs, and proper budget and schedule will likely have the people, the time, and other resources to improve quality.

Bonus (fun fact): Bermuda?Plan

No alt text provided for this image

OK, this is a funny one, but there is no explanation, and it may not really work.

Bermuda Plan is the name of a strategy to finish projects sooner. You send part of the team to Bermuda (i.e., remove them from the project), and the project finishes sooner.

It was conceived as a response to Brooks’s law (an observation about software project management according to which “adding people to a late software project makes it later”). So, if you remove people, should it go faster?

In my experience, each new person joining the team takes away about 1/3 of the time of one person already producing during the new person’s onboarding and until they ramp up to be fully productive.

So removing someone who recently joined may increase productivity.

Other reasons why it could work are if there are too many conflicts in the team. Removing people who are not aligned with the team goals may help.

If there are way too many people in a team, the communication overhead may be significant enough to hinder productivity. In this case, splitting the team may work well (which is technically not the same as removing people from the project).

Otherwise, removing people only reduces the capacity to build more in less time.

Anyway, I was just sharing the Bermuda Plan because it’s always fun to talk about it.?:-D

Originally posted at: https://blog.pplupo.com/2021-12-06-Seven-unspoken-truths-about-Software-Tests/

要查看或添加评论,请登录

Peter Peret Lupo, M.Sc., CSM, SCJA, ISC2 CC的更多文章

  • Collaborative Career Paths: HR & Engineering in Action

    Collaborative Career Paths: HR & Engineering in Action

    Carreer paths/ladders can be a powerful tool if used to its full extent, but this is not a task for either HR or…

  • Defining and implementing career ladders

    Defining and implementing career ladders

    Tips, guidelines, and some steps… ;-) In software engineering management, a career ladder stands as a strategic…

  • Why is estimating initiatives so stressful?

    Why is estimating initiatives so stressful?

    You already know the estimate is way off a few weeks after starting the project. So much effort went into estimating…

  • Objectively Choosing Among Technical Alternatives

    Objectively Choosing Among Technical Alternatives

    How to make decisions more objective and auditable. Managers and engineers often face difficult choices, such as which…

    1 条评论
  • When to keep the tests?manual

    When to keep the tests?manual

    Testing automation is an investment; You pay it now and save it later. These are everyday situations when you may want…

  • Essential questions for high-level estimates

    Essential questions for high-level estimates

    Unlocking the Power of Inquisitive Management: How to Refine Your High-Level Estimates with the Right Questions As a…

  • Software Measurement and Performance: ORKs, KPIs and GQ(i)M

    Software Measurement and Performance: ORKs, KPIs and GQ(i)M

    Modern approaches meet classical definitions: Combine OKR, KPI, and GQ(i)M for a complete measurement strategy. You…

    2 条评论
  • Is the testing pyramid reasonable?

    Is the testing pyramid reasonable?

    The origins and misinterpretations of the testing pyramid. What is the testing pyramid? In 2010, Mike Cohn published a…

  • Six Prioritization Techniques

    Six Prioritization Techniques

    A big part of managing our teams’ deliveries is tied to how our roadmaps are prioritized. Usually, a team has one…

  • Defining Software Measures

    Defining Software Measures

    How to accurately define software measures (according to the ISO/IEC/IEEE 15939:2017). OK, all.

社区洞察

其他会员也浏览了