Let's compare two test approaches
A artistic depiction of the real-world example voting application and event that inspired the testing examples described in this article.

Let's compare two test approaches

NOTE: an earlier version of this article used the terms "confirmatory" and "exploratory" to distinguish the two types of tests below. I have changed the language to be more descriptive of what is being compared in this article, as neither of those terms match well the comparison.

This article is going to compare two tests for a fictional web-based application. The approaches contrast a confirmatory approach against an exploratory approach. Neither is presented here as better than the other. They serve different purposes. Both provided valuable information to the product team and stakeholders. Both can, and probably will find bugs.


The difference between the two:


The first I am calling the "looking for the known" approach. This approach is targeted at expected product behaviors, checking to see if the product fails to do some specific action we intend it to do. The example below is designed to check for known, anticipated quantities.


The second I call the "looking for the unknown" approach. It is designed to amplify the probability of identifying or noticing mistaken assumptions, oversights, forgotten issues - to find things we did not think of.

Web-based voting application with write-in candidate box.

The example application, and the real-world event that inspired it are described in the following article. Skip to the section titled "Most Beautiful Person Poll": Hank the Angry Drunken Dwarf - Wikipedia


Approach 1, Confirm expected behavior:

"Check if write-in candidate vote is recorded"

  1. start fresh with no votes recorded
  2. check results page, confirm "Test Candidate" is not indicated
  3. navigate to voting page
  4. enter into write in candidate box "Test Candidate", submit vote
  5. go to results page, see if "Test Candidate" has 1 vote


Note how every step, every outcome, every check, is based on something that was expected already. The product was designed to behave in exactly the ways described, and failure to do so would indicate a bug. The steps are (mostly) precise, the check exact, making repro of any failure and understanding of the symptom likely very easy. The procedure above likely aligns to an exact business requirement, probably even on the demo.


Also notice how difficult it is to imagine ever shipping a product with a feature above where that test failed. The behavior is so easily described, so easily exercised, a problem so easily noticed, one would be befuddled to observe a released version of a product such as above failing that test.


These kinds of tests work very well in unit tests and CI/CD loops.


Approach 2. Looking for unanticipated product behavior:

Charter: investigate behaviors around write-in candidate voting.

  • write a script that enters random write-in candidate names and submit in a loop - let's go for 100,000 iterations - script emits a list of submitted candidates
  • while that script is running, launch a load-runner that hits the results page over and over with a battery of clients
  • while that is happening, try navigating the voting application to see how it behaves with the traffic pattern and an ever increasing set of write-in candidates
  • scan the server logs for errors and exceptions, scan the load runner logs for errors returned, periodically check if the list of candidates submitted by the script are also in the database, scrape the results page to see if all the write-in candidates are listed there
  • etc.


Note how the methodology, while derived from known behaviors, is not targeting a specific requirement. The tester is generating a randomized load pattern in two behaviors at once, with randomized data in one of them. The test methodology is not checking anything, but instead observing areas that seem likely to provide evidence of problems. The tester's behavior is not random and unintentional. The two load patterns are selected because in analysis the tester was concerned mistakes would happen in the combined problems of ongoing load, increasing entries in the system, and unanticipated data in the write-in candidate. The tester might find something directly related to the test procedure (exponential performance degradation as write-in candidate numbers increase?), or it may be that by exercising as much functionality at once it is easier to notice something one would not have seen otherwise. Maybe the server is chewing up memory allocations. Maybe the results page sometimes shows no results. The tester notices that all the votes are submitted by the same user because they forgot to write the script to clear the cookie cache between submissions (assuming that is how duplicate votes are checked) and now "no-duplicate votes" is getting violated.


By being less directed by known requirements, combining more things at once, it is more likely the tester is going to trigger conditions not anticipated in requirements, design and coding. Maybe there is something about the way checking cookies under load works that the developer did not understand. Maybe the character page settings on the database are specified in a way that makes some of the distinct candidate names appear the same, or maybe exactly the opposite. Perhaps "Leonardo DiCaprio" is being treated as a different person than "Leonardo Dicaprio" in the database - is that a bug, what did we intend? The approach the tester took doesn't even make that distinction, but notice how the methodology would surface it if it happened. This is because the validation style casts a wide net, a more ambiguous net.


It is easy to imagine bugs this type of testing might catch slipping past a product team and shipping. The testing is more difficult, the conditions and repro steps less certain and frequently entirely random. Most bugs that damage a business or product or hurt customers are the type this type of testing is meant to discover. These are the hard to anticipate problems that missed design and requirements discussions, or reflect a misunderstanding a developer had about code behaviors.


These kinds of tests work well with large time allocations for exploration and analysis.

They are almost always impossible, or at least do not do well, in CI/CD loops, or unit tests.

Tying the two types together

Tests that check anticipated behavior align well to business problems, design requirements. They derive well from BDD. Properly done they emerge naturally from TDD. They work well in CI/CD loops. These are where it works best to put a signal that, if raised, indicates a requirement was not met. We use these kinds of tests in cases where the need is anticipated, articulated, and can let us know when something must immediately stop and be corrected.

By contrast, tests targeted at the unknown and unanticipated provide feedback to design, requirements, and development of something that was missed or misunderstood. Frequently the team can translate the more open-ended approach into a different test expressed as a check. It becomes a "we would have done this check if we had thought of it" situation. The more open-ended style testing is a way you discover the thing you should have checked.

It is important in our discussions of testing we do not singularly focus on either approach. The two approaches trade power for weakness. Checks against the known and anticipated give immediate, clear signals where the decision is almost always immediately obvious, but they lack scope, breadth, and flexibility to uncover problems we did not anticipate. Testing for the unanticipated and unknown finds deep, difficult, complex problems that would have threatened the product substantially, but it usually requires a substantial investment in time and resources, the sort that fits poorly in fast iteration processes.

James Bach

Founder of Rapid Software Testing Methodology, Instructor, Consultant

1 年

(continued from last comment...) Not only in the examples, but also in the explanatory text, you are describing some of the circumstances of exploration, but not actually the ACTS of exploration. In other words, you could use the same words to describing a set of automated checks. You could talk about how checks were created because a tester was worried about this or that, or speculating about this or that, but that doesn't make the check any more exploratory. Exploration lives in the seeking mind and active learning and AGENCY (power to choose; power to control one's own processes) of the tester. It does not live in procedures or sequences of events. You are pointing at shadows on a cave wall. We can do better than that. So-called "confirmatory testing" (which I call output checking, or verification) is the outcome of a vibrant intellectual process-- or else it isn't. Let's talk about the first kind, not the second. This process, which seems confirmatory, inevitably leads to highly exploratory investigation the moment that the check fails-- or else it doesn't, in which case it's a fraud; testing theater rather than testing itself.

James Bach

Founder of Rapid Software Testing Methodology, Instructor, Consultant

1 年

You've come near to describing the difference between exploratory and confirmatory work, but I think you may actually have described two things that are both equally not exploratory. The second process you describe has a larger scope, so it may appear at first blush it is qualitatively different than the first. But they are both specified as checks-- the first one is slightly underspecified, and the second one is substantially underspecified. What characterizes "exploratory" and "confirmatory" is not level of specificity, but rather the relationship of the tester to the activity. The key question is: what role does agency play? Agency is the ball game. (I think this is a case where you are trying to explain an intuition, but your mental models have shunted you slightly off your topic. It's why it took me years of tinkering to come to the particular precise way I talk about this subject.) Your first example, if it were fully specified (e.g written in executable code) is only confirmatory if you ignore how it was created. It was created in a way that may have been highly exploratory. It may be the product-- and active agent-- of exploratory testing. We don't know, because you've hidden the process from us... (to be continued)

Nilanjan Bhattacharya

Technical Test Manager/lead for complex software products (cybersecurity, CAD, low code). Created and mentored test teams on par with the best. Public articles show my passion and thinking.

1 年

The conclusion is a stretch. Confirmatory tests may have their uses - it has nothing to do with testing. I won't get into whether ET fits into a fast cycle (it does). The issue is most teams don't really bother to know what is [exploratory] testing. Whether the cycle is fast is moot.

回复

I have some note on exploratory here, it's only part of this. Exploratory is by design is to explore. But we still want to have some "expected" behavior. But for exploring it does not matter, it matters for tester and in general to product owners (not in a scrum way, but to the someone who needs product to be released) Like "user opened a page, see the poll, but poll was unexpected to appear, list is ABC, why not CYZ?" etc.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了