Test data management : A microscopic view

Test data management : A microscopic view

Projects run on information and data, typically huge amounts of data. Testing thoroughly and applying testing techniques such as Boundary Value Analysis , Equivalence partitioning require substantial data within the system to carry out tests on the AUT . Volume(Lots of Test Data!) is another testing requirement which is mandatory for verticals such as Banking/Financial Services/Healthcare where historical and future dated  data play a key role in thorough application testing.

Cost overruns due to non-compliance of test data guidelines are typically ignored when testing the application in QA Environment. Applications running in test environments do not have any legal / practical implications but in real-time environments they pose data related threats which can cost the organizations a bomb!  Here are a few scenarios that we might want to look at.

Planning for test data management in projects is something which usually is not looked into, but if you are testing critical applications effective test data can mean Life and Death!

Typical Business Problems

Let’s assume we have a new bank XYZ which is being launched for Retail Customers. This means post the development the testing requirements would span across multiple modules within the product such as Payments and Transfers, Credit Card Bill Payments, Remote Deposit Capture & Cheque deposits, Fraud Check etc. Testing these requires a lot of pre-existing test data so that the edge case scenarios can be covered. Here are a few examples that may throw some additional light on the conditions we are referring to:

Validation of interest capitalizations in a leap year (Feb 29th) is a testing scenario which needs to be checked before rolling the banking system to production. In real time testing it may require years to test this condition, depending where you are in calendar terms (In the leap year example as you have the leap year appear once in 4 Years). It will become a problem if you do not validate the leap year interest capitalization and it actually fails in production. These are the edge cases which get completely ignored while testing.

Reporting Regulatory Requirements: Certain government regulations require your financial system to Retain last 1 year worth of data and discard/archive the rest. These are strict guidelines which may have other financial implications (Tax Audit etc.). Such rules need to be tested with great detail. Now with a new system we might run short of the test data that we may need to effectively test this! Most of the times the testing team would take a waiver from the business on these types of tests citing test data constraints! Which means this never gets tested in real-time. This is not a good situation to be in.

To test these conditions we need meticulous planning of test data in the system without which we will encounter the following problems.

  1. Empty Start – Since it’s a new system we cannot expect the test data availability so we need create test data which is a tiring manual effort and is an ongoing activity as we need test data throughout multiple time periods. The typical process here is a. Create data > b. Move dates Forward > c. Repeat step a.
  2. Low Volume Testing – With an empty database the performance of the test environment will be better when compared to the production, environment; this is an unreal way to test the application as we would not be able to replicate the real-time environment and defects.
  3. Back dated Testing – This is a very key point to consider when testing data intensive application, the ability to go back in time! If we take the ‘fill as we go’ approach we run a risk of not being able to test backdated scenarios which poses significant test coverage risks.
  4. Integration (Upstream/Downstream ) – Testing interfaces also pose a lot of blind spots for tests and test data broadcasts should be across interfaces. For e.g. – A bill payment information should not only be in our CBS (Core Banking System ) under test but also on the interfacing payment provider( 3rd Party bill payment Channel) to avoid account reconciliation issues which would lead to inaccurate accounting reports.

Possible approaches

While there is no conclusive solution available which can be used off the shelf for all the test data needs but we can deploy a few strategies that can be used to lessen the impact and help us improve our test coverage.

1.Manual Test data creation

This is the most straight forward way of creating data. The manual testing team or the product teams can perform regular tests on the application which will inject test data in to the system. Slowly over time we will have substantial test data within the QA environment. Usually it’s the front end of the application which the QA Team uses to test here. Fake data creation tools can be leveraged here to generate beautiful looking data which looks almost real.

Benefits: Manual Test data creation

Since manual testers would be creating data on the fly as a side effect of testing the application we don’t need to additionally factor in time/resources to fill in the data within the system. The existing test team can effectively plan the test data in a way which can create diverse set of data.

Demerits: Manual Test data creation

Since this is a manual method, we need time to fill the system with meaningful test data. If we need more data in the system we need to add more hands. The testing team also needs to be patient in entering reasonable data for the future. Since the method is not automated it will lead to thin data volumes as it’s done by humans. This technique also requires constant domain and application knowledge and the same team needs to fill the data in as well. The data which gets filled is current dated data which is not back dated. 

2.Automated Test data creation

The automated version of test data creation is an extension of the Manual way of data creation, the only difference is “speed”. We use machines to fill up the data. This method involves automation tools such as selenium/Lean FT to pump in data in the system. The data push can happen through the front end interface like a manual tester would do or can use web services API’s to fill the system with data.

Benefits: Automated Test data creation

Machines creating test data for the end user is always a much efficient way of creating test data in the system. In this approach the data fill is much faster and can be done overnight to save lots of time and eliminates human intervention. Test data accuracy is another factor which goes up dramatically when automation is introduced.

Demerits: Automated Test data creation

Automation is always expensive, it adds to the cost of the overall project. The effort required understanding the system and creating scripts which help in automation takes a lot time and there is a learning curve involved before we arrive at a bug free script which can be used. Automation also requires special skillsets hence there will be a need to hire/train resources on the technical aspects of automation. Automation scripts require constant updates.

3.Backend data injection

This technique involves direct interaction in the backend servers which is database. Since the test data is stored in the database we can directly update the databases with voluminous data using SQL Queries. This eliminates the need of front end data entry, However we need to be very careful while playing with this method as it fiddles directly with the database relationships which define data integrity. 

Benefits: Backend Data Injection

Running SQL Script is a super-fast technique to inject test data in the system. It may be very effective relative to the other techniques discussed above. The injection requires lesser technical knowledge when compared to test automation techniques. Another major benefit which we derive is going backward in time i.e. creating backdated entries; This is a major benefit which is not possible when we create data manually or through test automation.

Demerits: Backend Data Injection

SQL Injection technique may sound simple on face of it but this technique can be a disaster when implemented incorrectly. Data relationships are critical aspects which need to be managed while we inject data through the database. This technique can only be performed by domain experts who understand the system well and who knows how the data is flowing within the system so that all the right database tables are populated which would prevent data related issues.

Note: Improper SQL injections can corrupt database and the application may stop functioning. It needs careful operations and thorough understanding of the system’s functioning. Teams need to have proper database backups when attempting this method.

4.3rd Party Tools.

There are tools in the market that can be leveraged on from a data creation and injection purposes, these tools understand the backend application (where the data resides) very well and they pump in real like data and fill up the system under test. The test data created is very diverse and voluminous in nature and covers all the areas/modules of the application.

Benefits: 3rd Party Tools

3rd Party tools are the most accurate ways of creating test data as the tools understand the domain and the system very well. The tools are designed in a way which perfectly populates real-time data in the system. The tool also takes care of the backdated data fill which would enable users to perform tests on historical data. The end user need not have tremendous domain level expertise to perform this.

Demerits: 3rd Party Tools

The biggest fallback this system has is the steep price tag! The data injecting tools are pretty expensive and can drill a huge hole in the pocket. Another disadvantage is these tools are not generic and can be used only with one kind of system which make them very narrow in terms of the applications they can work with. For e.g. A test data injection tool for Temenos T24 will not work with Infosys Finacle

Conclusion

Test data management and Injection is a very subjective topic and there can be many ways to look at it. If done properly and carefully it can yield amazing results which can remarkably help the QA and the Business teams in testing the application thoroughly, as most of the applications rely heavily on data these days. No matter which technique we choose we need to have the concept of “Time Travel” (Data Backfill) In mind when it comes to actual testing of the application which involves going into past and future dates to simulate the actual tests (Especially for systems in the financial domain). The best part is the effort you spend here translates into Dollars saved in production.

Hope you liked the Article!

Niranjan Keshavan - Project Lead ( Cigniti Technologies)

Vishal A.

Senior QA Engineer - Playwright | JavaScript | Python

8 年

Thanks Niranjan for such informative article on test data management however I would request you to cover test data creation also w.r.t Agile specially for test automation working in CI/CD models under devOps which in current prevailing model in the industry.

Mohua Ghosh S.

Scrum Master Manager

8 年

Gud one :) informative

Prashanth Thiruvaskur

Product Manager | Product Discovery to Launch| Product Strategy | Technology Enthusiast | Problem Solver | ECommerce, Banking, Insurance & B2B SaaS | Certified Pega Business Architect

8 年

Very well articulated bro :) Very nice (y)

要查看或添加评论,请登录

Niranjan Keshavan的更多文章

社区洞察

其他会员也浏览了