Transforming Test Data Management (TDM) with Copy Data Virtualization (CDV)
Thirunavukarasu Papanasam
VP - Customer Engagement at Maveric Systems Limited
Why we should Virtualize Copy Data?
In every Organization, Production data is copied to multiple databases for various purposes like Backup, DR, Reporting, Testing, etc. It has been estimated in an IDC survey that:
- 77% organizations has >200 production data sources
- The average enterprise creates 8-10 copies of every production data source for various testing purposes like Dev., QA, UAT, etc.
- 70% of organizations refresh their copy data at least once in a week
- Generally refreshing a single copy takes ? a day or more
- 85% projects delayed due to waiting for data
- Multiple copies of data costs business $48 billion
Also, in most of the organizations, multiple testers are using the same test environment which results in same data being used by different testers resulting in improper/delayed testing.
By virtualizing one data set into multiple test environments, organizations can overcome all the above mentioned challenges.
How CDV Works?
Since it is a new concept, very few tools are available in the market which can do Copy Data Virtualization like Actifio, Delphix and Oracle Snap Clone. Though all these tools use different technologies/methods to implement CDV, the functionality is more or less same.
- Initially, data blocks from the database are copied to CDV tool and stored internally.
- Copied data blocks will be de-duped and compressed to reduce the data storage.
- Tools provide SLA based incremental data refresh approach, which allows data refresh from seconds to days.
- During Incremental data load, only blocks corresponding to data that has changed in the database will be copied along with initial and another previously copied data blocks
- Tools never overwrite the data blocks so that virtual copies can be created for any point-in-time snapshot (from the time the initial data blocks are copied)
- Data copied into CDV tool can be provisioned (virtually) within minutes, as the tool just creates pointers to data blocks based on the snapshot that needs be virtualized. For example, if a tool has data blocks taken at T0 (initial) and T1 (incremental), and if snapshot at T1 needs to be virtualized, tools will create a virtual database with pointers pointing to blocks (which are not changed) at T0 and the changed blocks at T1.
- From a user perspective, virtually created database can be used like a physical database. They are free to do any DDL/DML operations based on their access privileges defined at database level.
- If any changes are done by a user in a virtual environment, only the block(s) that corresponds to that changes are created separately and corresponding data block pointer(s) will now point to the new data block(s). This way, any changes done in one virtual copy will not impact any other copy even though they all points to the same snapshot.
Benefits of CDV in TDM
- Reduced hardware and storage (50-70% reduction)
- Data can be provisioned in minutes (>90% savings in efforts)
- Ability to provision same set of data to multiple testers without any conflict during testing
Points to remember
Copy Data Virtualization will be very beneficial to organizations as it can be used beyond TDM purpose as well. At the same time, while deciding to implement CDV, organizations should consider the following points:
- Generally, Licensing is based on the amount of the data being processed rather than the number of users.
- Existing database(s) can’t be directly virtualized. Tools copy the data internally and only the copied data blocks will be provisioned thru virtualization
- File based data (such as CSV, XML, JSON, etc.) can’t be virtualized
- Not all databases are supported by the tools. So, before finalizing a tool, ensure the databases that needs to be virtualized are supported
- Database version of both Source and Target (virtual) databases should be same.
- Combining multiple source databases and creating one virtual copy is not possible
Director - Data Engineering @ (GAVS+GS Lab) | Cloud Analytics & Scalable Platforms Leader | Driving Data Strategy, Innovation & Product Excellence | Certified: Azure Data Engineer, Microsoft Fabric, Power BI, Databricks
8 年Nice & Useful Post !!!