DWH testing
Darshika Srivastava
Associate Project Manager @ HuQuo | MBA,Amity Business School
Data Warehouse (ETL) Testing
What is the significance of testing Data Warehouse and Business Intelligence systems?
Testing plays a critical role in the success of any of the above two systems, by ensuring the correctness of data that builds the faith of end-users.
In general, a defect found at the later stages of the software development life cycle costs more to fix that defect. This situation in the DW can be worsened because the wrong data found at the later stages might have been used in important business decisions by that time.
Thus, the fix in the DW is more expensive in terms of process, people and technology changes. You can begin the DW testing right from the requirements gathering phase.
A requirement traceability matrix is prepared & reviewed, and this mainly maps the DW features with their respective business requirements. The traceability matrix acts as an input to the DW test plan that is prepared by the testers. The test plan describes the tests to be performed to validate the DW system.
It also describes the types of tests that will be performed on the system. After the test plan is ready all the detailed test cases will be prepared for various DW scenarios. Then all the test cases will be executed and defects will be logged.
There is a standard in the operational world that maintains different environments for development, testing, and production. In the DW world, both the developers and testers will make sure that the development and test environments are available with the replica of production data before starting their work.
This is copied for a list of tables with limited or full data depending on the project needs, as the production data is really large. The developers develop their code in the developer’s environment and deliver it to the testers.
The testers will test the code delivered in the testing environments to ensure if all the systems are working. Then the code will go live in the production environments. The DW code is also maintained in different versions based on the defects fixed in each release. Maintaining multiple environments and code versions helps to build a good quality system.
Goals Of Data Warehouse (ETL) Testing
Let’s take a look at the Goals Of Data Warehouse Testing.
#1) Data Completeness: Ensure that all data from various sources is loaded into a Data Warehouse. The testing team validates if all the DW records are loaded, against the source database and flat files by following the below sample strategies.
#2) Data Transformation: While uploading the source data to the Data warehouse, few fields can be directly loaded with the source data but few fields will be loaded with the data that is transformed as per the business logic. This is the complex portion of testing DW (ETL).
Below are the sample strategies to test this:
#3) Data Quality: Data warehouse (ETL) system must ensure the quality of the data loaded into it by rejecting (or) correcting the data.
DW may reject a few of the source system data based on the business requirements logic. For Example, reject a record if a certain field has non-numeric data. All the rejected records are loaded into the reject table for reference.
The rejected data is reported to the clients because there is no chance of getting to know about this missed data, as it will not be loaded into the DW system. DW may correct the data by loading zero in the place of null values etc.
#4) Scalability and Performance: Data warehouse must ensure the scalability of the system with increasing loads. With this, there should not be any degradation in the performance while executing the queries, with anticipated results in specific time frames. Thus performance testing uncovers any issues and fixes it before the production.
Below are sample strategies for Performance and Scalability Testing:
领英推荐
#5) Integration Testing: Data warehouse should perform Integration Testing with other upstream and downstream applications. If possible, it is better to copy the production data into the test environment for Integration Testing.
All system teams should be involved in this phase to bridge the gaps while understanding and testing all the systems together.
#6) Unit Testing: This is performed by the individual developers on their deliverables. Developers will prepare unit test scenarios based on their understanding of the requirements, run the unit tests and document the results. This helps the developers to fix any bugs if found, before delivering the code to the testing team.
#7) Regression Testing: Validates that the DW system is not malfunctioning after fixing any defects. This is performed many times with every new code change.
#8) User Acceptance Testing: This testing is performed by business users to validate system functionality. UAT environment is different from the QA environment. The sign off from UAT implies that we are ready to move the code to production.
From the Data Warehouse and Business Intelligence system perspective, business users can validate various reports through a User Interface (UI). They can validate the report specifications against the requirements, can validate the correctness of data in the reports, can validate how quickly the system is returning the results, etc.
DW Testing Flow Diagram:
Data Warehouse Testing Responsibilities
Enlisted below are the various teams involved in delivering a successful DW system:
Errors In Data Warehouse
When you are Extracting, Transforming and Loading (ETL) data from multiple sources there are chances that you will get bad data that may abort the long-running jobs.
Following are the key causes of failure in the DW system:
#1) Business Rule Violations (Logical Errors): Logically wrong data violates the business rules. Such data can be handled mostly during transformation or loading phases.
#2) Data Rule Violations (Data Errors): Data errors occur inside the DW database system like data type mismatches, data constraint failures, etc.
ETL Deployment
This is the phase where all your efforts go live. All the production support documents should be prepared.
The documentation will tell others about the sequence of jobs to run, failure recovery scenarios, training materials to the DW support teams to monitor the system after deployment and to the administrative support team to execute the reports.
Conclusion
We learned about the Goals of Data Warehouse Testing, ETL Testing Responsibilities, Errors in DW and ETL Deployment in detail in this tutorial.
We hope you got an idea of how detailed testing can be done in a Data Warehouse (ETL) System.