How big should a "Unit" be?
Composite image, artistic fair use of source material from the TV show "The Unit" (copyright CBS America)

How big should a "Unit" be?

This article is an extended version of my thoughts on the topic of Unit Testing, originating from a discussion with Adrian Stanek , and James Galyen .

What is a "Unit" in Unit Testing?

The common definition of unit testing, reads:

unit testing is a software testing method by which individual units of source code —sets of one or more computer program modules together with associated control data, usage procedures , and operating procedures—are tested to determine whether they are fit for use

While this definition makes sense, it does not offer any practical guidance on what the size of a unit should be. When asked, most software developers fall into one of two schools of thought:

  • A "unit" is a class, or any public method in a class.
  • A "unit" is a collection of classes, within the same module, that contribute to a single functionality.

Personally, I lean towards to the second school of thought. In this article, I will dive into how we can define the boundaries of a "unit" for testing purposes.


Q: So, how big should a "unit" be?

A: It depends. ??


Understanding System Boundaries

To give you a coherent, and practically oriented, breakdown of my thoughts on the matter, let's take a look at an example system design. The system depicted in the sketch below is an n-tier reference architecture with two user-facing components: a web-based user interface application and a command-line interface application. Both of these connect to the backend system through API calls. At the far end of the system are two distinct data storage systems (databases) used to persist the system's data.

We can define the following layers:

  • User Control: Components that allow for user interaction to the system. Contains the CLI and UI components.
  • Public interfaces: Technical interfaces to the functionality offered by the system. Contains API controllers that expose system logic.
  • Logic: business rules and logical services, built from models and functions that describe the solution domain.
  • Data Access: provides transparent access to the persistent storage solution (usually a database or external system connection).
  • Storage & external system calls: the storage solutions and external systems exposed by the Data Access layer. Contains elements that offer persistence functionality to the system.

Sketch of a reference architecture, with some superimposed structural boundaries (red) and functional slices (green).

Now that we have an overview of our system from a structural point of view, let's add some more specifics to make it more realistic and tangible. We will assume the UI is a a web application written in a popular JavaScript framework, and that the interface elements are JSON/HTTP controllers that do some minimal mapping. The data access services house ORM framework models.

Our system offers three main functionalities, two of which are exposed through the web interface, the third is only accessible through the CLI interface. Let's say this third functionality pertains to administrative tasks that are not relevant for the end-users, but are required for application management.

Technical versus functional boundaries

The challenge in defining the boundary of a testing unit arises from the desire to find a one-size-fits-all scope. In practice, your goal is not to find an academic solution but to write tests for specific purposes, usually to validate the functionality or quality attributes of a system. Let's explore the downsides of using common test unit scopes, such as layers and functional slices.

Using layers to define units

Our example architecture is structured in layers, commonly referred to as an "n-tier" architecture. In this concept, each layer has a specific purpose. The overall system is composed of multiple layers, each depending only on the functionality offered by the layer directly below it. We could scope our units to be "one element in one single layer".

This makes sense when testing your data access layer, as you want to answer the question: "Are my objects serialized, stored, and de-serialized correctly?".

The layer-based approach becomes trickier when moving up to the top-level layers. Consider how you would test the API controllers in isolation. Their responsibility is generally minimal: accept and incoming http message, possibly transform this to an internal representation, and delegate that to the relevant business logic service. If we assert that the "correct service" is called, we are no longer testing functionality, but adding a structural requirement to our system: "this API will always call this service". Do this a few times, and your system becomes difficult and costly to change. Let's say a few months from now, we decide to extract some methods from the existing services and bundle them together into a new business service.

So, what happens to our existing API tests after our restructuring?

They will all start failing, as we wrote them to assert they are calling a specific method in a specific service. The code snippet below illustrates such a test.

Example of a unit test for a REST API controller that validates structure rather than function.

Using slices to define units

Rather than using the layers, we could also use functional slices as boundaries for our Units. The disadvantage of this approach is quite obvious: it is easy to test too much of your system. At this point, your test is no longer "lean and mean" and is crossing over into integration-test territory as you will need to instantiate a significant part of your system in order to run your test.

Recalling our system example, we could stub out the databases or use an in-memory test database. While these are sensible approaches to minimize how much of your system you need to spin up to run your test -- and can arguably be considered to be "unit tests" -- they still come at a cost of significantly increasing your test suite's execution time.


Image depicting the testing pyramid: types of test and their related scope and feedback-loop speed. source:


Advised approach: matching boundaries to test

What is internal vs what is external, and by consequence: what scope a "unit" is, depends on what you are testing.

As such, you are usually better off focusing on defining "responsibility units", and testing those using whatever elements are most relevant to the functionality you wish to assert. While doing this, try and minimize the amount of elements you need to instantiate.

In other words: Depending on what functionality you are looking to validate, your unit size should vary to contain strictly the components that are responsible for supplying this functionality.

Note that this means you will probably not explicitly test all possible code paths at build time. And that is fine. You can usually assume that a function call will happen correctly. If you want to have the additional assurance: add smoke tests or integration tests to make sure your system behaves normally.

Let's apply this approach to our example system.

  • Say I want to test the mapping of my API layer. I would define my unit as being only the API, and related mappers. Everything else is external to this responsibility scope, and should be stubbed/mocked. We can now test the mapping in isolation.
  • Say I want to test the "A user can create an account"-functionality. The responsibility boundary would now contain: the relevant API, and the business service. Assuming we have tested the "data can be stored and retrieved"-functionality on the DAO layer, we can stub this layer out.
  • Say I want to test the Data Access service responsible for storing user information. I would set up an in-memory database (or test container) to minimize relying on the presence of a real database cluster. I can now write a test that takes an object to be stored, stores it, retrieves it, and asserts that the retrieved object is the same as what I put in. Attentive readers might notice this approach avoids the test breaking if the data object changes, as I am testing that storing+retrieving data from the storage is symmetrical.

TLDR / Key takeaways

  • What is internal vs what is external, and by consequence: what a unit is, depends on what you are testing, and can vary from one test to the next.
  • It is easier to define boundaries for your test scopes if you are able to assign specific responsibilities to your system elements.
  • Avoid describing an element's purpose as "A calls B". If your elements are really that dumb, do not bother testing them.
  • The split between "unit tests" and "integration tests" is somewhat artificial, but boils down to: if you need to spin up a large part of your system in order to run the test, you are probably writing an integration test.


References and further reading

Jun Chatani

Cloud Engineer at KBC

9 个月

Nicely written! “They will all start failing, as we wrote them to assert they are calling a specific method in a specific service.” This exact scenario occured once during refactoring a codebase. The tests were tightly coupled with the implementation and not testing the actual behaviour of the system. Such brittle tests often make it unpleasant for developers to write tests or refactor??

Adrian Stanek

Daily Videos on Leadership & SaaS | Entrepreneurial CTO | Guiding Teams & Leaders, Mentoring, Dad ????????

1 年

This is a very insightful and well written article that made me think. Interestingly, Software is most of time about boundaries, units, and relations. This article demonstrates that very well. I will follow your writing, Stijn Dejongh. And thank you very much for the mention, much appreciated!

Nicholas Ocket

I turn developers into software design experts. Join the About Coding Dojo!

1 年

Great article! "In practice, your goal is not to find an academic solution but to write tests for specific purposes, usually to validate the functionality or quality attributes of a system.?" - This resonates quite hard!

Dieter Jordens

Owner of Kwal-IT/Chadeau - Software Developer & Coach

1 年

Good article, I also got a different take on this. In my eyes however big a unit is, it should do one thing, and it should do it well! Doesn’t really matter if it’s an object, service, api endpoint or application. In the context of unit testing, it should be a small unsplittable unit in my eyes though

要查看或添加评论,请登录

Stijn Dejongh的更多文章

  • Learning for the Long Haul: A Tale of Two Approaches

    Learning for the Long Haul: A Tale of Two Approaches

    Introduction Have you ever wondered what you should be learning to get ahead in your career? In today’s rapidly…

    7 条评论
  • Getting hired in software.

    Getting hired in software.

    You learned how to code. Awesome! Companies will start fighting over you, throwing cash at your feet and asking you to…

  • Become a hero in the Bash Shell

    Become a hero in the Bash Shell

    Why do I like my Command Line Interface tools so much? The reasons are remarkably simple, actually: You have control:…

    4 条评论
  • [Example] test-first bug-fixing in JavaScript

    [Example] test-first bug-fixing in JavaScript

    “Writing tests slows me down, let me just fix the issue” This is a sentence I often hear when advising people to start…

    4 条评论
  • How easily include file content into your github README.md

    How easily include file content into your github README.md

    Writing documentation is somewhat of a hassle, keeping your repository README.md file up to date even more so.

  • Using "eat what you cook" API testing

    Using "eat what you cook" API testing

    I was inspired by a section of the book "97 things every developer should know" ~ Kevlin Henney et al. Currently, I am…

  • Why should you be concerned about the rise of the Agile?Empire?

    Why should you be concerned about the rise of the Agile?Empire?

    A long time ago, in software teams not so far away Around the turn of the century a lot of things were looking…

    13 条评论

社区洞察

其他会员也浏览了