How to automate data migration testing with DbFit (15 mins read with practical scenarios)
I recently faced some challenges when working with Data Migration testing since it was quite difficult to manually verify a lot of data without a data loss and with ETL transform logic it was much harder. Therefore, after some research, I came across a tool called DbFit however since there was no direct tutorial on how to use it, I decided to write an article. Here I mainly focus on data migration test automation.
What is data migration testing?
Conducting the set of tests to ensure that, data migration successfully completed from one or many data sources to another data source, without any data loss or corrupted data. Once a migration has been executed, an end to end testing should be done for the Source dataset and target data set by ensuring all the data were transformed properly by meeting the transformation logic.
What are the challenges in data migration testing?
- Time and resource consuming
- Too many scenarios to be covered
- A simple mistake will lead to a risk of functional failure at production
- Need proper reporting mechanism
- Rollback testing possible (means we need to verify same scenarios again after the roll-forward)
What is the solution?
Data migration test automation
Why DbFit tool fit to automate data migration testing?
Pros - Open source, multiple DB support, An active community, Test reporting
Cons - Not supported by NO SQL databases
What is DbFit?
- Open source evolving tool which can be used to automate database testing
- Combination of the FIT testing framework and Web-wiki front-end called FitNesse
- support databases such as Oracle, SQL Server, Mysql, Postgres, Derby, HSQLDB, DB2, DB2i, Netezza, Informix, Teradata
- DbFit can be executed in 2 ways
- Java (support MySQL, Derby, Oracle and DB2 databases)
- Net (support MS SQL Server, Oracle, and DB2 databases)
Starting automaton with DbFit (Explained with Ubuntu 16.04 version)
Prerequisites
- Java Runtime Environment (JRE) 5 or higher (recommend the latest one).
- Ensure the accounts and connectivity to the databases which you need to access.
Step 1
DbFit can be downloaded from its website https://dbfit.github.io/dbfit/ unpack the archive and run startFitnesse.sh (Linux or Ubuntu) or startFitnesse.bat (windows) via terminal.
Go inside the DbFit downloaded folder open the terminal inside that folder (r.click and open terminal) and run command startFitnesse.sh
Step 2
Navigate to the https://localhost:8085/DbFit link from your browser to open Fitnesse UI.
Create test suite: Navigate to the upper tab Click Add -> Suite Page and click “Save” button. Insert Proper page name (ex:- MySuiteOne). Help texts and tags are not mandatory.
Important: Fitnesse test suite names should be in camel case format
Step 3
Now type the created suite name on the search bar and you will be redirected to the newly added suite page.
Then click your suit name ex:- DbFit.{your suite name}.
Step 4
Now we can create our test cases inside the test suite.
Navigate to the upper tab click Add -> Test page. Insert Proper page name (ex:- My_First_test). Help texts and tags are not mandatory.
Important: Fitnesse test case names should be in camel case format
Now you can add your test script inside the test case (we are explaining a sample test with this article)
Now click “Save” button. Then you are redirected to your test suite “DbFit.MySuiteOne” again.
Step 5
Run test suite:- Navigate to the suite and click “Suite” in the upper tab
Run test case:- Click (ex:- My_First_test) in your test suite and hit “Test” button in the upper tab.
Step 6
Then your test will be run and results will be displayed.
Writing a DbFit Script (Java version)
Important: Drivers of some of the supported databases are not included with DbFit. If need provides the required additional JDBC driver jar files externally by copying them into your downloaded DbFit -> lib folder
Sample code (Standalone mode):
!path lib/*.jar
|import fixture|
|dbfit.fixture|
!|DatabaseEnvironment|postgres|
|Connect|{our postgres db server}|{user name}|{password}|{database name}|
!| Store Query |!- SELECT item_no AS INO, item_name AS ITEM
FROM abc.item_table WHERE country='Sri lanka'
ORDER BY item_no ASC LIMIT 3000 -!|postgresOutput|
|Database Environment|
|Close|
!|DatabaseEnvironment|DB2|
|Connect|{our db2 dbserver}|{user name}|{password}|{database name}|
!|Store Query|!- SELECT item_number AS INO, item_name AS ITEM
FROM abc.item_table WHERE country_name='Sri lanka'
ORDER BY item_no ASC LIMIT 3000 -!|db2Output|
!|compare stored queries|postgresOutput|db2Output|
|INO|ITEM|
|Database Environment|
|Close|
Import Support drivers for test execution
!path lib/*.jar
Connecting to the database
!|DatabaseEnvironment|{DbFit database Type for your DB}|
|Connect|{our Postgres DB server}|{user name}|{password}|{database name}|
DbFit database connection can work in two modes
- Flow mode:- only can be used when both database queries which needs to be compared are from the same database type
- Standalone mode:- can compare different type of database queries (ex: source query of Db2 and destination query of Postgres)
Multi-line queries - use by enclosing them within !- and -!. This prevents special character formatting and enables to write a query in more than one lines (unless DbFit limits writing a query into a single line)
Store Query - reads out query results and stores them into a Fixture symbol for later use
Compare Stored Queries - compare two previously stored query results from stored variables.
Important: For better performance when comparing large data set it’s recommended to sort inputs in ascending order (default ASC)
Reading Test Results
Once executed the test, Matching rows will be colored in green and failed rows will be colored in red and ignored rows will be in yellow and passed/failed/ignored count displayed in the top of the fitnesse UI.
DbFit does cell to cell comparison in the result table, but whether the test is passed or failed will be decided row-wise. If one cell has failed after the comparison from both source and target database queries entire row will be failed with red color. you have to investigate the particular record to identify the failure
- Missing (Red color) - Expected row actually not returned. Rows which are present in source query but not in the target query.
- Surplus (Red color) - Actual row returned but not included in expected. Rows which are present in target query but not in the source query.
- Pass (Green color) - Expected row actually returned. All returned values are equal in every column for the given row of both source and target query.
Test assertions
Test results displayed on top of the page bar after the test is executed.
Right (Bar will be Green color) - Passed all the tests
Wrong (Bar will be Red color) - Failed at least one cell to cell comparison
Ignored (Yellow color) - Test will be ignored due to syntax issues etc...
Lessons learned by DbFit
Stack overflow error - When executing no of rows which cannot be handled by the java heap of the local machine (around 6000 rows can be executed in a local machine. You can execute 50,000 or more records when DbFit establishes in a separate server)
Restart server if errors occur - restart the DbFit local server by (./startfitneese.sh)
Bind variable issues - |set option|bindsymbols|false| should be used when there are parameter mappings to the bind variables
References :
DbFit documentation :- documentation
DbFit community support:- community
If you require further information about data migration test automation checks out my other article about an Overview of data migration test automation.
Sithira Pathirana
Software quality engineer - Sysco Labs Sri Lanka
CI/CD/CM | Data & Analytics QA | ETL QA Automation
5 年Nice article on DBFit and currently I am using DBFit exclusively for ETL TDD framework automation . Please contact me if you need more info.
Lead Full Stack Engineer | Developer | DevOps
6 年Great article bro! Keep up the good work ??
Java/Kafka Developer
6 年Good Job . Keep it up ...
Lead QA Engineer at Acumatica
6 年great work bro ..