登录查看更多内容

The Structure of Tests - The Model

Prahladavaradan Sampath

Development Manager and Product Lead at The MathWorks

发布日期: 2024年3月3日

In my last post, I identified a few attributes of a test suite that indicate the quality of the test suite: fault detection, fault localization, and efficiency. In this post, I will construct a simple mathematical model for these attributes and explore the behavior of the model with a few examples.

A set based model

Let's start off by identifying two distinguished sets, S and T, modeling software and the test suite, respectively. The set S represents the collection of software components, and T represents the collection of test components.

By using sets to model software and test components, we are making some assumptions—that there is no structure other than identity that we care about for software and test components. For example, we do not model any notion of dependency between software components. This will help keep the model simple.

We additionally model a "covering relation" between test and software components: (s C t) represents the fact that test (t) is a test for (covers) the software component (s).

With this basic setup, we are now in a position to mathematically model some of the attributes of a test suite.

Efficiency

Recall that the cost of executing a test suite is a measure of its efficiency. This can be easily calculated as the summation of the size of source components covered by each test:

Viewing the relation (C) as a Boolean matrix, this is the number of true entries in the matrix.

Defect Detection

Suppose a component (s) has a defect. A reasonable measure of whether it will be detected is the number of tests that exercise the component. Similarly, if a defect is in the interaction between a set of components—an assembly (A)—a reasonable measure of whether it will be detected is the number of tests that exercise the assembly (A).

The value of this metric ranges from 0 to (|T| \times (2^{|S|} - 1)). Each assembly can be tested by up to (|T|) tests, and there are ((2^{|S|} - 1)) assemblies (ignoring the empty assembly). The higher the number of this metric for a test suite, the better the defect-detection capability of the test suite.

Defect Localization

Defect localization is about how well test failures can be used to triangulate defects. If a test (t) fails, this indicates a defect in the assembly (C^{-1}(t)) - the "inverse image" of the test (t) in the relation (C). And if a collection of tests, say (M), fails, the defect should be in the assembly consisting of the intersection of all assemblies that are inverse images, with respect to the relation (C), of the tests in (M).

For the moment, let us assume that there is only a single defect in the software—and additionally that tests fail only because of defects in the software! The situation becomes more complex if we have to deal with multiple defects or defective tests! For an assembly (A), we define the smallest localizable assembly larger than (A) as:

Now, we can measure the capacity of a test suite to localize a defect by considering each assembly in turn (ignoring the empty assembly), and measuring how close the test suite can get to identifying this assembly using test failures:

This metric has a value in the interval ([0, (2^{|S|} - 1)]) - each assembly has a value in the interval ([0,1]), and there are ((2^{|S|} - 1)) assemblies. A value of 0 indicates perfect localization, while a large value indicates poor localization. (I have been lazy and given an asymptotic value as the upper-bound of the metric - it is not precise).

领英推荐

Production-Grade Prompt Engineering: A Comprehensive…

Kye G. 5 个月前

Unveiling the Future: AI's Role in Enhancing Software…

Abhay Chaturvedi 9 个月前

Rather than replacing testers, AI will elevate their…

Cristiano Caetano 7 个月前

Some Examples

Let us evaluate the metrics defined above against a few scenarios to check that it models our intuition of these metrics. Consider (m) source components and a test suite of size (n).

System Tests Scenario

Let us consider the situation where all the tests are system tests - the matrix representing the covering relation is a full matrix - every test covers every component. In this case:

The efficiency metric of the test-suite evaluates to (m \times n)?- this is the maximum cost of all possible test-suites.
The defect-detection metric evaluates to (n \times ((2^{m})?-1) - every assembly in the system (and there are ((2^{m}) -1)?assemblies), are tested by all the n?tests in the test-suite. This is the maximum value of defect-detection that is possible.
The defect-localization metric evaluates to

For every assembly (A), (L(A) = m). We can only localize defects to the full system. For each assembly of size k, the localization metric evaluates to ((m-k)/m) - giving the expression above.

Unit Tests Scenario

Let us now consider the situation where every test is a unit-test : a test for just a single source component. In this case

The efficiency metric of the test-suite is (n). This is the lowest possible cost for a suite of ?tests.
The defect-detection metric evaluates to (n). Each test can find defects in a single component and no-more!
The defect-localization metric evaluates to

For unit-assemblies, (U), the value of (L(U)) is (1). However for assemblies, A, of two or more source components, the value of (L(A)) is (m) - this is because there are no tests for these assemblies and an intersection of empty-set is the universal set!
This is surprising because we generally expect unit-tests to have high defect-localization. However, according to our definition, unit-tests seem to have defect-localization expression that is only slightly different from that of system-tests! However, reflecting on the situation, the problem is that the informal and intuitive measure we had for defect localization considered only defects within a software-unit and not in the interaction between software units.

Comprehensive Tests Scenario

Finally, let us consider a third case, where we have one test for every possible assembly of software components, i.e. (n = ((2^{m}) - 1)). In this case:

The efficiency metric of the test-suite is:

Each assembly has a test and the cost of executing the test is size of the assembly. This is a fairly large value.
The defect-detection metric evaluates to

An assembly of size (1) is tested by (2^{m-1}) tests, and one of size (2) is tested by (2^{m-2}) tests and so on.
The defect-localization metric evaluates to (0). This is because for any assembly, we have a test that tests exactly that assembly - for every assembly, (A), the value of (L(A)) is (|A|); so the metric (L) collapses to (0). This is perfect localization!

Next steps

Based on the definitions above, we can now study different kinds of test suites, trying to gain insights into the effectiveness of testing—in particular, I am really curious whether we will be able to justify the popular recommendation of structuring tests as a "test pyramid".

I plan to now run a simulation study based on these definitions and will report on this in my next post.

I'd love to hear your comments on this post. Is there some insight I am missing? or maybe I have made an error in the calculations? I was quite surprised initially by the poor defect-localization metric calculated for unit-tests, but I feel the definition is right - and my initial intuition was wrong! What do you think?

要查看或添加评论，请登录

Prahladavaradan Sampath的更多文章

The Structure of Tests - The Experiment

2025年1月8日

The Structure of Tests - The Experiment

In this post, I would like to extend the previous two posts on the same topic and perform numerical experiments to…

5 条评论
The Structure of Tests - The Problem

2024年2月18日

The Structure of Tests - The Problem

In this series, I want to talk about test-suites. Test-suites are important because we spend as much time testing…

3 条评论
The Limits of Formal Modelling : models meet real-life

2022年9月10日

The Limits of Formal Modelling : models meet real-life

We have all been affected by the pandemic and our lives are slowly limping back to some degree of normalcy. As…

3 条评论
Pragmatic Mutation Testing and Formal Verification - Two for the Price of One

2022年8月3日

Pragmatic Mutation Testing and Formal Verification - Two for the Price of One

I have been trying to make sense of mutation testing for some time now. There is considerable interest in this approach…

3 条评论

The Structure of Tests - The Model

Prahladavaradan Sampath

Development Manager and Product Lead at The MathWorks

A set based model

Efficiency

Defect Detection

Defect Localization

领英推荐

Some Examples

Next steps

Prahladavaradan Sampath的更多文章

社区洞察

其他会员也浏览了

Automated Testing: Enhancing Reliability in MLOps Pipelines

AI Broke Test Automation

AutoGenesisAgent: Self-Generating Multi-Agent Systems for Complex Tasks

Fuzz testing - Automated Injection of Invalid Data

Building a Better Future: The Importance of Precision in Words, Phrasings, and Concepts

Manual Testers in an Automated World

The Role of Machine Learning in Predictive Software Testing

Beyond Bugs: The Critical Role of Software Testing in the AI Revolution

Choosing a tool for logging, tracing & evals - how to?

The Bandwagon Effect in Software Modeling and DDD

A set based model

Efficiency

Defect Detection

Defect Localization

领英推荐

Some Examples

Next steps

Prahladavaradan Sampath的更多文章

The Structure of Tests - The Experiment

The Structure of Tests - The Problem

The Limits of Formal Modelling : models meet real-life

Pragmatic Mutation Testing and Formal Verification - Two for the Price of One

社区洞察

其他会员也浏览了

Automated Testing: Enhancing Reliability in MLOps Pipelines

AI Broke Test Automation

AutoGenesisAgent: Self-Generating Multi-Agent Systems for Complex Tasks

Fuzz testing - Automated Injection of Invalid Data

Building a Better Future: The Importance of Precision in Words, Phrasings, and Concepts

Manual Testers in an Automated World

The Role of Machine Learning in Predictive Software Testing

Beyond Bugs: The Critical Role of Software Testing in the AI Revolution

Choosing a tool for logging, tracing & evals - how to?

The Bandwagon Effect in Software Modeling and DDD