The Structure of Tests - The Problem

The Structure of Tests - The Problem

In this series, I want to talk about test-suites. Test-suites are important because we spend as much time testing software as we do developing it. There’s a lot written about testing, different types of tests, test-suite selection, test-suite optimization, software reliability models etc. which use a variety of mathematical models to study these aspects.

I started thinking about this topic after some team discussions about how good our testing is. I work with many geographically distributed teams, who work on all sorts of software components like formal-methods solvers, numerical solvers, code-generators, compilers, and user-interfaces. Some teams develop and maintain “platforms” and “platform services” which other teams use to build more services or features. This all happens in a busy environment where code is always being added and changed, tests are run, problems are found and fixed, and bugs are sorted out.

There are some interesting questions about test-suites:

  • When a team changes their code, what tests should they use to check their changes? Should they only use their own tests, or also use tests from other teams?
  • When a test fails, what’s the best way to find out which part of the software caused the failure? I’m sure you’ve seen a bug report get passed around between teams, with each team proving it’s not their fault.
  • How can we find weak spots in a test-suite to help us decide where to improve our testing? Do we need more system-tests, unit-tests or subsystem-tests?

Attributes of a Test Suite

The goal of a test-suite is efficiently detect and localize errors in software being tested:

  1. Defect detection - this means the test-suite can find certain types of defects in the software.
  2. Defect localization - this means the test-suite can show where a defect is in the software. This could be in time (as the software design changes) and in structure (a part of the software).
  3. Efficiency - this means the test-suite doesn’t cost too much to run.

Each of these attributes can be measured in a few ways. For example, we could measure how good a test-suite is at finding defects by:

  • Counting how many times a test-case has failed and found a defect. This needs us to keep track of past test-failures.
  • Looking at how much of the code the test-suite covers. This is an indirect way of measuring how good the test-suite is at finding defects.
  • Using mutation testing to measure how good the test-suite is at finding defects.

We can measure how good a test-suite is at locating defects by looking at how good they are at triangulating a software defect based on the failures that are observed. It is common to have a covering relation from a test-case to a particular subset of source-components that it tests, so a failure of a test-case indicates that the failure is in one of the source-components it covers.

Efficiency is about how much it costs to run a test-case. This can depend on many things, like the needed hardware or how hard it is to set up the test. But a simple way to measure cost is by the number of test-cases and how many instructions each test-case runs. For example, we could count the lines of code that each test-case runs.

These attributes often clash with each other, making it hard to optimize a test-suite. Let’s look at a few examples to make this clearer.

Only system tests: Let’s say we have tests that cover all the ways users might use the system. These tests are very reliable, maybe they’re even part of a legal contract. In this case, each test could be seen as a test of every part of the system. This means the test-suite is very good at finding defects. But it’s not good at localizing defects - if a test fails, we can’t tell which part of the system caused the failure. Also, the test-suite isn’t efficient - every test checks every part of the system, so adding a single test can be very costly.

Only unit tests: Now let’s say each test only checks a single part of the system. In this case, the test-suite is very efficient - the cost of running the test-suite is just the size of the system. This approach is also very good at localizing defects - if a test fails, we know exactly where the defect is. But it’s not good at finding defects, because typical user scenarios need the whole system to work together, not just individual parts.

Test-Suite Design in Practice

In practice, test-suite design is guided by development methodologies and heuristics. One popular methodology is “test-driven-development” (TDD). This method suggests we first write test-cases for the functions we want, then write the code and fix any issues until all the test-cases pass.

A popular heuristic for test-suites is the “test-pyramid”. This idea helps us think about how to spread out different types of test-cases: unit, integration, and system tests. Unit-tests are at the bottom of the pyramid. They are small, cheap to run, and make up most of the test-cases in a test-suite (about 70%). Integration tests are in the middle. They test how units work together to give more complex functions (about 20%). They are harder and more expensive to set up. System tests are at the top of the pyramid (about 10%). They are the most expensive to set up and need the full application to be running. This can be hard when parts of the system are always changing.

In my next post, I plan to use basic math tools (like sets and relations) to model the situation. This might give us some new insights about the structure of a test-suite. I haven’t seen much written about this approach, even though there’s a lot written about other aspects of testing, like using program analysis or statistical approaches to test-selection.

I’d love to hear what you think about this topic, and your experiences with well-designed (or badly designed) test-suites.

Tanvir Hussain

Founder, Quality Management Consultant

1 å¹´

Nice start, I have seen a number of problems with test designs, one of the biggest being the mis communication. The lower layers often goes well due to the short length of communication. If not developed in a very structured way, often the Basis of the upper level tests are weakly documented and thus Tests and testers suffer. This also costs a lot of manpower in investigation, decision of release etc.

Prantik Chatterjee, Ph.D

Senior Software Engineer at MathWorks | PhD (CSE-IIT Kanpur) [Formal Verification & Machine Learning] | Intel Research Fellow'22-23

1 å¹´

Great read. Looking forward to the next topic. One particular issue I have faced with test design, is hardcoded values. With modifications to the underlying algorithm, these tests fail even though the logic is correct. Another problem is flaky tests. It'll be great to have your insights on how to avoid writing tests that may become flaky.

要查看或添加评论,请登录

Prahladavaradan Sampath的更多文章

  • The Structure of Tests - The Experiment

    The Structure of Tests - The Experiment

    In this post, I would like to extend the previous two posts on the same topic and perform numerical experiments to…

    5 条评论
  • The Structure of Tests - The Model

    The Structure of Tests - The Model

    In my last post, I identified a few attributes of a test suite that indicate the quality of the test suite: fault…

  • The Limits of Formal Modelling : models meet real-life

    The Limits of Formal Modelling : models meet real-life

    We have all been affected by the pandemic and our lives are slowly limping back to some degree of normalcy. As…

    3 条评论
  • Pragmatic Mutation Testing and Formal Verification - Two for the Price of One

    Pragmatic Mutation Testing and Formal Verification - Two for the Price of One

    I have been trying to make sense of mutation testing for some time now. There is considerable interest in this approach…

    3 条评论

社区洞察

其他会员也浏览了