TDDA Library: Launching Exercises with Screencasts
Nick Radcliffe
CEO, Stochastic Solutions ?Behaviour modelling ? Data Science ? Data Quality ? Sustainability ? Organizer, PyData Edinburgh.
If you're interested in software testing, in general, or testing data and data pipelines in particular, the open source TDDA library (https://tdda.info/installation) may be of interest to you.
The TDDA library focuses on two main things:
- Reference testing in Python. This is a set of extensions for unittest and pytest designed to make testing complex, possibly variable outputs easier. Although relevant to data testing data science code in particular, this functionality is potentially useful to anyone writing tests in Python.
- Using constraints to verify data and detect anomalies. The library not only has functionality for checking data, but also for generating constraints from example "known good" data. It features a command-line tool as well as an API, so it relevant regardless of whether you use Python. (It can test CSV files, data in databases, data in DataFrames etc.)
We recognize that the documentation hasn't been the strongest part of the TDDA offering, and are working on improving this. As part of that effort, we've started to create simple exercises for different parts of the software, starting reference testing. There are detailed descriptions and accompanying videos (screencasts—you won't have to see my ugly mug!).
The first exercise is now available in two forms—a unittest-flavoured version, for regular folk, and a pytest-flavoured version, for those who prefer that.
Enjoy!