登录查看更多内容

Shift Left - How I carried out ETL & data migration testing activities (10 mins read)

Husny Jeffrey

Senior Test Developer @ Grab | Ensuring Quality of Million-Dollar Features

发布日期: 2018年2月28日

Recently I got an opportunity to work in an ETL & Data Migration project. I would like to share the way I have carried out data migration testing activities and a checklist to verify data migration through this article.

First, let’s understand what is data migration and why we need data migration.

What is data migration?

Simply, transferring data from one database to another.

Why?

Data migration requirement can come due to several reasons. Below I have mentioned two of them.

Decommissioning legacy applications. (Re-build the application using a latest architecture)
Change of the database vendor (e.g. from Oracle to MySQL) etc.

In a microservice architecture environment, there should be a separate micro database for each microservice. So that changes to one service’s database does not impact to any other services.

In our case, we are moving from monolithic architecture to microservice architecture. Therefore, we should migrate service related data from a large legacy database to a new micro database.

To explain the topic in a practical way, I am taking an example.

Assume our source database is DB2 and destination database in PostgreSQL.

There is a table in source database (DB2) which contains customer related data which customer microservice needs to consume in performing dedicated functions.

Table description in source database (DB2)

STEP ONE – We have source database (DB2) table, we should finalize destination database model

For that team came up with below database table model.

Team decided to concatenate "registered_date" and "registered_time" as "registered_date_time". Also add an extra column "is_deleted" to provide soft deletion.

Developers are going to create this table in the new micro database by referring to this document. At this point as testers we must review this document and make sure the proposed architecture (database model) is correct.

I would highly recommend for testers to involve from the design phase itself so that you can provide your inputs in early stage and that will reduce the review time as well.

To review the document with the proposed database model, I have created my review checklist.

Database model document review - Checklist

Number of columns - Verify whether proposed database model includes all the source table (customer) columns in DB2 which we have planned to migrate to the micro database.

In above example - We have discussed and agreed to concatenate date and time columns and agreed to add a new column (is_deleted)
Result - Pass

Column names - Verify whether column names mentioned in proposed database model describes the columns in DB2 correctly.

In above example - All the column names in proposed architecture describes the columns in DB2
Result - Pass

Note: It’s not necessary to mention the name of the customer as “customer_name” in customer table. If you defined column as “name”, it means "customer_name" since the column is anyway in customer table.

Column data types – If you are NOT applying any transformation logic, data types should be matched in the source and destination.

In above example - We have decided to transform the date time format to epoch timestamp. Other than that, columns data types are matched with DB2 column’s data types.
Result - Pass

Column sizes – Column size mentioned in proposed architecture should be greater than or equal to DB2 column size.

I have given an input for this point. Since we have defined “name” as varchar [30], it’s better to increase the size of "email” column from varchar [30] to varchar [40]. Because there is considerable likelihood that having an email address with first name and last name. Since email include a domain name, (e.g. @gmail.com) size of "email" field should be greater than the "name" field.
Result – Pass

Primary Key / Foreign Key – Primary key and foreign keys (if available) should be mentioned in proposed architecture document.

Result - Pass

Not Null – Not null columns defined in source (DB2) should be mentioned in proposed architecture document.

Result - Pass

Database best practices –

Table names and column names should be lower case and should be separated using an underscore (do not use spaces for table names)

Singular words for the table names (customer instead of customers)

No unnecessary prefixes or suffixes (customer instead of tbl_customer) etc.

Result – Pass

Consistency - Columns referring to the same data should be identical everywhere in the database. (e.g. Data type and size of the customer id in customer table should be same as the data type and size of the customer id (foreign key) in the order table)

Result – N/A

Default value – Discussed and agreed default values should be mentioned in the proposed database model document.

Result – Pass

Since I have spent some quality time in reviewing the proposed database model document, I have identified some mistakes earlier which could have been a cause for an issue in future.

Okay, now we have finalized the database model.

STEP TWO – Need to create the finalized table in PostgreSQL database

Developers are creating these table in the new micro database in accordance with the proposed database model document. Once they commit the SQL queries, they are adding testers as reviewers for their pull requests for the remote repository. As testers, we should be aware on SQL queries and need to review their pull request against the proposed database model. (A defect prevention activity)

According to above example, SQL query should be,

CREATE TABLE customer (

id VARCHAR(10),

name VARCHAR(30),

credit_limit INT NOT NULL,

registered_date_time BIGINT,

email VARCHAR(40),

is_deleted BOOLEAN DEFAULT FALSE,

PRIMARY KEY (id)

);

The column names, data types, sizes and data validations applied in this query can be verified with the document and can approve the pull request from the tester’s end.

In this case you can approve the pull request containing the above query since it will create the exact table which is described in proposed database model.

Okay, step two is also done. Once everything is fine with the DB scripts, these scripts can be executed in the QA dedicated PostgreSQL DB instance.

STEP THREE – Review ETL transformation logic

Before moving into step 3, let’s explain ETL briefly.

ETL stands for Extract, Transformation and Load.

Extract – Extracting data from source database (in our case, it is DB2)

Transformation – Apply transformation logic to change source data if needed. (in our case, we have write a logic to concatenate "registered_date" and "registered_time" as "registered_date_time").

Load – Load the data to the destination database (in our case, it is PostgreSQL)

We can create ETL jobs using some tools (e.g. talend) and can migrate data from source to destination by running ETL jobs.

As testers, we should review transformation logic in ETL rather than waiting to verify migrated data. So that we can prevent data mismatches or incorrect data in destination database.

As an example we can go ahead with a discussion with developers and verify that all the transformation logic identified in the design phase are implemented in the ETL

Okay, we have reviewed transformation logic as well.

STEP FOUR (final step) – Run the ETL job, migrate data from source to destination and verify data accuracy to make sure data migration is successful

Once ETL job is completed, we need to make sure all the data in source database (DB2) are in destination database (PostgreSQL) as well. We call this as data accuracy testing. Carried out to make sure data migration is successful.

For that, I have a checklist to verify destination data against source data.

Data accuracy testing - Checklist

Verify record counts
Verify record counts for a specific duration (e.g. total customers registered in 2017)
Latest date in the table
Oldest date in the table.
Sum of an integer column
Max of an integer column
Min of an integer column
Number of customers who are using Gmail (%@gmail.com) etc.
more...

There are more to discuss. Test the performance of ETL, fault tolerance testing, challenges faced in data migration etc.. Let's discuss soon.

Enjoy!

Husny Jiffry

Senior Quality Engineer - Sysco LABS, Sri Lanka

Balaji Gunasekaran Selvi

Staff Performance Engineer in Walmart Global Tech

6 年

It’s really a great article!! whoever want to know about the ETL testing. Awesome Husny Jiffry

1 次回应

Chatham Vidanagamage

Artificial Intelligence and Product Engineering Leader

7 年

Hi Husny, it is mind blowing how well you have explained ETL. Explanation with simple examples are exceptional. Well thought and structured article. Your article is a great example for people who are trying to write fancy articles but struggling to explain the practical application of a concept. Please keep the next versions coming. If you agree, is it okey if I add your article to my company’s testing knowledge base ? I will provide the necessary credentials to you.

1 次回应

Sithira Pathirana

Software Quality Engineer at N26

7 年

Thanks for making valuable articles.worth to share with all.learnt the ETL manual testing process step by steps

1 次回应

查看更多评论

要查看或添加评论，请登录

Husny Jeffrey的更多文章

A smarter way of doing automation within the sprint (5 mins read)

2018年2月14日

A smarter way of doing automation within the sprint (5 mins read)

Through this article, I am going to explain how to complete test automation before developers start to develop the…

17 条评论
How I test an API of a microservice – (5 min read)

2018年2月12日

How I test an API of a microservice – (5 min read)

Before going to discuss about the topic, let’s understand what is a microservice. A microservice is basically an…

7 条评论
Early testing through test automation (3 min read)

2018年2月7日

Early testing through test automation (3 min read)

Early testing through test automation taught few good lessons to our team. Recently our team developed a microservice…

3 条评论
The agile way, we work II (2 min read)

2018年2月5日

The agile way, we work II (2 min read)

In last backlog grooming ceremony, our team re-assessed the priorities of items in the product backlog with product…
The agile way, we work!

2017年8月1日

The agile way, we work!

PO requested few changes in UI and we have responded for those in immediate iteration. Drew the proposed UI on a glass…

1 条评论
How I think before designing test cases

2017年6月29日

How I think before designing test cases

This is not about “Test design techniques” rather “How I think before designing test cases”. This is based on my…

9 条评论
PEARSON Corporate Social Responsibility (CSR)

2016年8月15日

PEARSON Corporate Social Responsibility (CSR)

Trying to make a difference in the life of people. Project "Strengthening Education through Safe Learning Spaces for…
Selenium WebDriver for beginners in Sinhala

2016年1月16日

Selenium WebDriver for beginners in Sinhala

Hi All, I created my second sinhala video tutorial. This is for people who going to start automation or who interesting…

19 条评论
Software Manual Testing in Sinhala

2016年1月16日

Software Manual Testing in Sinhala

I created a YouTube tutorial in sinhala. This is my first tutorial.
Future of Artificial Intelligence

2015年4月23日

Future of Artificial Intelligence

AI is the computer system that is related to the similar task of using computers to understand human intelligence sense…

See all articles

Shift Left - How I carried out ETL & data migration testing activities (10 mins read)

Husny Jeffrey

Senior Test Developer @ Grab | Ensuring Quality of Million-Dollar Features

Husny Jeffrey的更多文章

社区洞察

其他会员也浏览了

Unlock the Full Potential of Your Data Migration: Why Leaders Should Rethink Their Tools

Informatica

Informatica

How to automate data migration testing with DbFit (15 mins read with practical scenarios)

Data engineering basic project 1 step by step with source code

ETL best practices

ETL Data Integration Lead

How the ETL Testers Can Help You with Data Handling

Top 8 ETL Tools For Data Extraction, Transformation and Load

7 Key Benefits Why You Need ETL Expertise from GrassDew

Husny Jeffrey的更多文章

A smarter way of doing automation within the sprint (5 mins read)

How I test an API of a microservice – (5 min read)

Early testing through test automation (3 min read)

The agile way, we work II (2 min read)

The agile way, we work!

How I think before designing test cases

PEARSON Corporate Social Responsibility (CSR)

Selenium WebDriver for beginners in Sinhala

Software Manual Testing in Sinhala

Future of Artificial Intelligence

社区洞察

其他会员也浏览了

Unlock the Full Potential of Your Data Migration: Why Leaders Should Rethink Their Tools

Informatica

Informatica

How to automate data migration testing with DbFit (15 mins read with practical scenarios)

Data engineering basic project 1 step by step with source code

ETL best practices

ETL Data Integration Lead

How the ETL Testers Can Help You with Data Handling

Top 8 ETL Tools For Data Extraction, Transformation and Load

7 Key Benefits Why You Need ETL Expertise from GrassDew