The Structure of Tests - The Model

The Structure of Tests - The Model

In my last post, I identified a few attributes of a test suite that indicate the quality of the test suite: fault detection, fault localization, and efficiency. In this post, I will construct a simple mathematical model for these attributes and explore the behavior of the model with a few examples.

A set based model

Let's start off by identifying two distinguished sets, S and T, modeling software and the test suite, respectively. The set S represents the collection of software components, and T represents the collection of test components.

By using sets to model software and test components, we are making some assumptions—that there is no structure other than identity that we care about for software and test components. For example, we do not model any notion of dependency between software components. This will help keep the model simple.

We additionally model a "covering relation" between test and software components: (s C t) represents the fact that test (t) is a test for (covers) the software component (s).

With this basic setup, we are now in a position to mathematically model some of the attributes of a test suite.

Efficiency

Recall that the cost of executing a test suite is a measure of its efficiency. This can be easily calculated as the summation of the size of source components covered by each test:

Viewing the relation (C) as a Boolean matrix, this is the number of true entries in the matrix.

Defect Detection

Suppose a component (s) has a defect. A reasonable measure of whether it will be detected is the number of tests that exercise the component. Similarly, if a defect is in the interaction between a set of components—an assembly (A)—a reasonable measure of whether it will be detected is the number of tests that exercise the assembly (A).

The value of this metric ranges from 0 to (|T| \times (2^{|S|} - 1)). Each assembly can be tested by up to (|T|) tests, and there are ((2^{|S|} - 1)) assemblies (ignoring the empty assembly). The higher the number of this metric for a test suite, the better the defect-detection capability of the test suite.

Defect Localization

Defect localization is about how well test failures can be used to triangulate defects. If a test (t) fails, this indicates a defect in the assembly (C^{-1}(t)) - the "inverse image" of the test (t) in the relation (C). And if a collection of tests, say (M), fails, the defect should be in the assembly consisting of the intersection of all assemblies that are inverse images, with respect to the relation (C), of the tests in (M).

For the moment, let us assume that there is only a single defect in the software—and additionally that tests fail only because of defects in the software! The situation becomes more complex if we have to deal with multiple defects or defective tests! For an assembly (A), we define the smallest localizable assembly larger than (A) as:

Now, we can measure the capacity of a test suite to localize a defect by considering each assembly in turn (ignoring the empty assembly), and measuring how close the test suite can get to identifying this assembly using test failures:

This metric has a value in the interval ([0, (2^{|S|} - 1)]) - each assembly has a value in the interval ([0,1]), and there are ((2^{|S|} - 1)) assemblies. A value of 0 indicates perfect localization, while a large value indicates poor localization. (I have been lazy and given an asymptotic value as the upper-bound of the metric - it is not precise).

Some Examples

Let us evaluate the metrics defined above against a few scenarios to check that it models our intuition of these metrics. Consider (m) source components and a test suite of size (n).

System Tests Scenario

Let us consider the situation where all the tests are system tests - the matrix representing the covering relation is a full matrix - every test covers every component. In this case:

  • The efficiency metric of the test-suite evaluates to (m \times n)?- this is the maximum cost of all possible test-suites.
  • The defect-detection metric evaluates to (n \times ((2^{m})?-1) - every assembly in the system (and there are ((2^{m}) -1)?assemblies), are tested by all the n?tests in the test-suite. This is the maximum value of defect-detection that is possible.
  • The defect-localization metric evaluates to

  • For every assembly (A), (L(A) = m). We can only localize defects to the full system. For each assembly of size k, the localization metric evaluates to ((m-k)/m) - giving the expression above.

Unit Tests Scenario

Let us now consider the situation where every test is a unit-test : a test for just a single source component. In this case

  • The efficiency metric of the test-suite is (n). This is the lowest possible cost for a suite of ?tests.
  • The defect-detection metric evaluates to (n). Each test can find defects in a single component and no-more!
  • The defect-localization metric evaluates to

  • For unit-assemblies, (U), the value of (L(U)) is (1). However for assemblies, A, of two or more source components, the value of (L(A)) is (m) - this is because there are no tests for these assemblies and an intersection of empty-set is the universal set!
  • This is surprising because we generally expect unit-tests to have high defect-localization. However, according to our definition, unit-tests seem to have defect-localization expression that is only slightly different from that of system-tests! However, reflecting on the situation, the problem is that the informal and intuitive measure we had for defect localization considered only defects within a software-unit and not in the interaction between software units.

Comprehensive Tests Scenario

Finally, let us consider a third case, where we have one test for every possible assembly of software components, i.e. (n = ((2^{m}) - 1)). In this case:

  • The efficiency metric of the test-suite is:

  • Each assembly has a test and the cost of executing the test is size of the assembly. This is a fairly large value.
  • The defect-detection metric evaluates to

  • An assembly of size (1) is tested by (2^{m-1}) tests, and one of size (2) is tested by (2^{m-2}) tests and so on.
  • The defect-localization metric evaluates to (0). This is because for any assembly, we have a test that tests exactly that assembly - for every assembly, (A), the value of (L(A)) is (|A|); so the metric (L) collapses to (0). This is perfect localization!

Next steps

Based on the definitions above, we can now study different kinds of test suites, trying to gain insights into the effectiveness of testing—in particular, I am really curious whether we will be able to justify the popular recommendation of structuring tests as a "test pyramid".

I plan to now run a simulation study based on these definitions and will report on this in my next post.

I'd love to hear your comments on this post. Is there some insight I am missing? or maybe I have made an error in the calculations? I was quite surprised initially by the poor defect-localization metric calculated for unit-tests, but I feel the definition is right - and my initial intuition was wrong! What do you think?

要查看或添加评论,请登录

Prahladavaradan Sampath的更多文章

社区洞察

其他会员也浏览了