课程: Data Pipeline Automation with GitHub Actions Using R and Python

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

Data quality checks

Data quality checks

- [Instructor] We'll conclude this chapter with reviewing the pipeline data quality checks or unit test. Let's start by defining the term data quality checks. Data quality checks or data unit test is the process of evaluating the data structure and its values with the use of set of deterministic and non-deterministic assumptions. Example of non-deterministic assumptions are data structure and data attributes. For example, the number of columns or their attributes such as numeric string, or time objects. The field names the value range. So for example, for our electricity data, we do not expect negative values and duplications. Likewise, examples of non-deterministic assumption or expectations are missing values, the value range. So for example, in our electricity data sets, we can measure the mean, the standard deviation, and set a threshold of when we want to alert if the standard deviation is higher from the mean and delays. Example for delay, if we expect to refresh the data every…

内容