2024 #6 - Unit Testing
Meagan Palmer
I help data leaders save time and money by making complex simple. Training and consulting services available.
In this edition of Analytics Engineering Today, we’re going to talk about unit testing: what it actually is, why it matters, and the misconception that running a few queries on your data is the same thing.
With some of the teams I’ve worked with, this distinction matters because testing often falls into the bucket of “let’s just check a few row counts and call it good.” That’s useful, but it’s not unit testing. Proper unit testing makes your transformations reliable, future-proof, and much easier to maintain.
What is Unit Testing?
Unit testing comes from software engineering. In that world, a “unit” is the smallest testable part of an application, usually a function or method. The purpose of unit tests is to check that these small components work exactly as intended, in isolation.
Key characteristics of unit tests:
? Isolation: Unit tests focus on the component being tested, without relying on databases, APIs, or other external systems. Mocks or stubs are used to simulate inputs.
? Granularity: They test the tiniest parts of the code - no big-picture testing here.
? Automation: Unit tests are automated and repeatable, giving quick feedback.
? Speed: Because they’re testing small, isolated pieces, unit tests run fast - ideal for frequent checks, for example in your CICD pipeline.
What Does Unit Testing Look Like in Data?
In data, unit tests focus on business logic and transformations. A “unit” could be:
But, and this is important, unit tests don’t rely on your real data. Instead, they use static, controlled input data that you manufacture specifically to test edge cases, boundary conditions, and expected outputs.
What Unit Testing is NOT
It’s easy to get confused about what unit testing actually is. So let me clear this up.
That's not to say the rest of these aren't useful. Of course they are. I'm just asking you all to stop calling them unit tests!
Why Unit Testing Matters: Future-Proofing Your Transformations
In my view, unit testing really shines in it's future-proofing capacity.
When you write transformations, you’re often coding for scenarios that don’t exist in the current data, but you have discussed with the stakeholders.
领英推荐
Without a unit test, someone could update or refactor your code and miss the edge case entirely. With a unit test, you:
? Simulate the relevant data.
? Confirm the logic works as intended.
? Protect this rule from future changes.
Unit tests act as guardrails for your code. They make sure the edge cases you’ve accounted for today don’t quietly break when someone else makes changes down the track.
Unit Testing in Action: A Real-Life Scenario
Imagine you’re building a rule to mark customers as “inactive” after 90 days without a purchase.
Your current data doesn’t have any 90-day gaps, so how do you know the rule works?
With unit testing:
? You create a small dataset with a known number of 90 day gaps.
? You write a test to validate that the “inactive” logic applies correctly, and returns the exact amout expected. Not one more, not one less. In unit testing, this is often called an assertion.
? Your rule works, now and in the future.
If someone changes the logic later, intentionally or not, your unit test will catch it.
How to implement Unit Tests
Implementation in practice will depend which tool you are using. Most tools that have kept pace with the move to DataOps have some unit testing functionality.
dbt implemented unit tests earlier in the year, see their docs.
In python a commonly used package is pytest. It needs to be installed. It's more comprehensive than unittest which is a built in library.
Unit testing isn’t about checking today’s data, or that you didn't drop rows when moving data around. It’s about building confidence that your transformations will hold up under any condition, even scenarios you haven’t seen yet.
If you’re not writing unit tests for your models and business logic yet, it’s worth starting. They’re fast to run, simple to implement, and they save you a lot of headaches in the long run.
I’d love to hear from you. Do you write unit tests in your data engineering or analytics engineering? What’s your approach to testing edge cases and business logic? If this is new to you and you’re keen to get started, reach out and let’s chat!
PhD Student | Software Engineering | Unit Testing | Governance IT | Risk Management
2 个月Hi Meagan, congratulations on your approach to unit testing. I'm very interested in this topic, and in my academic researchs, I usually use PHPUnit. I've sometimes found myself evaluating external dependencies as unit tests instead of integration tests. Your approach clearly highlights the key characteristics of unit tests. Here in Brazil, I participate in a community called PHPRio, coordinated by Vitor Mattos, Daiane Alves, and Lucas Azevedo. We meet every month to discuss software development, with a focus on the PHP language (https://github.com/phprio)
SQL || Java || Python || C || Haskell || Rust || TypeScript
3 个月this as well as the data tests are my favourite things about dbt right now. Its awesome its is bringing the level of rigour in normal software development to data modelling work, especially with CICD. As well as it just being intrinsically satisfying to prove your code this way. Super underrated feature and good read!