登录查看更多内容

Data virtualisation - an overview

SujithKumar Chandrasekaran

Engagement Manager @ HSBC | Driving Data Ingestion Growth

发布日期: 2022年4月15日

The practice of copying data to multiple locations to integrate it has been prevalent in the industry. But, is there a way to see a combined view from all the data sources and silos without physically copying?

We created many data warehouses, data marts and data lakes. We have realised that we have copied data to so many places that maintenance, security and data governance have now become a challenge. Did we at least remove silos? No, we still have data silos. So what is the solution? Is there an alternate? The answer is ‘yes’ and it is called data virtualisation. Though it is not a magic bullet, it is still worth exploring. Let’s discuss it in this article.

Data virtualisation is combining data virtually from different sources into a single, unified view. The data remains in the source system itself and is not replicated anywhere else.

How does it work?

You can achieve data virtualisation by performing the following three actions: Connect, Combine and Serve.

Connect to the data source using the JDBC or HTTP client URL for JSON files, etc.
Combine: Once you connect to the data source, you can extract data and create a base view for each source. You can integrate them into a single unified schema. It happens virtually without replicating the source data physically.
Serve the data to all the consumers, such as data analysts, machine learning engineers, and data scientists. The good thing is that they don’t even know where all this data is coming from, what were their formats originally, etc.

Advantages from a data engineer’s perspective:

1. Accelerated delivery: As you don’t create any physical replica of data, data virtualisation is proven to deliver a minimum viable product a lot faster than the traditional data warehouse solutions.

2. Abstraction: Data virtualization uses service-oriented architecture and decouples storage from processing.

3. Secured: As the data virtualisation combines data and serves for consumption from one place, you can implement all data security control governance in one place.

4. Lineage: you can track the data lineage of the virtual target dataset as it is combined centrally.

5. Reuse: You can replicate the same business logic to all the sources, improving developer productivity.

6. POC: You can use data virtualisation as a proof of concept for creating an expensive data warehouse.

7. Change the source data: You can add or remove any columns a lot faster than the ETL processes.

8. Data ownership: As you don’t copy or replicate the data, the ownership of the data assets still lies with the respective source side business.

9. Transform and clean: You can perform transformation and data cleaning activities virtually before you serve them to consumers.

Advantages from a data scientist's perspective:

1. Simplicity: data scientists don’t need to extract data from disparate sources and merge them all on the consumption side as the data is available in a single place.

2. Single source of truth: As the data virtualisation provides a unified schema aggregating data from all sources, it serves as a ready-made single source of truth.

3. Reflect changes to underlying data: As the raw data still stays in the source, the changes in data are reflected at the consumption layer with no additional process.

Points to consider:

1. Data virtualisation is not a replacement for other data integration techniques wherein you need to copy the data physically. You may still need to create a data warehouse and data lakes where you need to perform complex logic for integration or the volume is too high.

2. Performance tuning is key and the key to successful data virtualisation is performance tuning.

3. As the consumption layer serves from many disparate sources, the uptime of your consumption layer should be in synch with the source systems.

4. Similar to data integration techniques, you will need to have a comprehensive data catalogue that describes the metadata of the disparate data stores, like where is the datastore? what does it contain? how frequently is it updated? and so on.

5 The data definition varies from department to department, business to business. However, in order for you to create a meaningful data virtualisation solution, you will need to have a uniform data definition across the organisation.

Hope this gives a high-level understanding of data virtualisation. I have attempted to minimize the usage of jargon and focused on concepts. Thanks for reading. If you find this article useful, please like, share and comment.

Views are personal and in no way reflect my current & previous organisations and vendor partners.————————————————————————————————————

Image credit:

Photo by Andrea Piacquadio from Pexels
Photo by FlorS Q from Pexels
Photo by RODNAE Productions from Pexels
Photo by Brett Jordan from Pexels
Photo by Anthony from Pexels
Photo by Christina Morillo from Pexels
Photo by Luke from Pexels

References & Additional Reading:

https://www.dhirubhai.net/pulse/how-can-you-address-data-silos-sujithkumar-chandrasekaran/?trackingId=UgBPoMuOTRi0Onz%2By0Xs0g%3D%3D
https://www.denodo.com/en/webinar/data-virtualization-introduction-4
https://www.tibco.com/resources/ebook-download/data-virtualization-going-beyond-traditional-data-integration-achieve
https://www.datavirtualizationblog.com/overlooked-capability-data-virtualisation/?utm_source=Denodo&utm_medium=SM-Medium
https://www.datavirtualizationblog.com/data-virtualization-enables-successful-data-governance/?utm_source=Denodo&utm_medium=SM-Medium
https://denodo.medium.com/digital-transformation-in-financial-services-addressing-5-key-trends-with-data-virtualisation-5cd9039b60a2
https://medium.com/@rahulraghavendhra/data-virtualization-shop-for-all-the-data-you-need-d0904f50893d
https://www.datavirtualizationblog.com/successful-data-virtualisation-more-than-the-right-choice-of-platform/
https://www.datavirtualizationblog.com/5-ways-survive-thrive-competitive-market-using-data-virtualisation/

all about data

708 位关注者

Sathya Narayana Kaliprasad

2 年

Well written Suji... Can you also expand to showcase the various tools used? Their advantages and disadvantages

1 次回应

查看更多评论

要查看或添加评论，请登录

SujithKumar Chandrasekaran的更多文章

GDPR in 3 mins - 1 of 7 Principles

2022年11月12日

GDPR in 3 mins - 1 of 7 Principles

Having gone through the scope and objective in our earlier Newsletters, let us discuss the protection and…
GDPR in 3 mins - Objective & Rights it protects

2022年10月16日

GDPR in 3 mins - Objective & Rights it protects

Understanding the legal terms is difficult for an Engineer like me. However, I attempted my level best to simplify by…

1 条评论
GDPR in 3 mins - Scope & Definitions

2022年10月9日

GDPR in 3 mins - Scope & Definitions

The General Data Protection Regulation (GDPR) is the world's strictest data privacy and security law. This law was…

1 条评论
Are you becoming a Chicken ?

2022年10月3日

Are you becoming a Chicken ?

I had never taken Tea or coffee until I went to the university and started to stay in the hostel. That was because my…

3 条评论
Differential data privacy - an Overview

2022年9月25日

Differential data privacy - an Overview

Customers' data is private, and the data analyst can't use this sensitive information. But then, the Dataset is full of…
Differential Data privacy - demystified

2022年9月17日

Differential Data privacy - demystified

One of the critical challenges data practitioners face is that we expect them to provide vital information without…

1 条评论
Model extraction using Active Learning

2022年9月10日

Model extraction using Active Learning

Most cloud service providers offer Machine Learning as a Service (MLaas). By the way, what is MLaaS? As the name…
Data Free Model Extraction Attack

2022年9月3日

Data Free Model Extraction Attack

Before we start discussing the data-free model extraction attack, let us understand how the Model extraction typically…
I know what you did last summer

2022年8月26日

I know what you did last summer

You had a common business problem across the industry. So you, as a CDO, secured funding from the Business to develop a…
Adversarial attacks on "Explanation models"

2022年8月21日

Adversarial attacks on "Explanation models"

Before we start our discussion on attacks, let us understand the explanation model, why we need it in the first place…

See all articles

Data virtualisation - an overview

SujithKumar Chandrasekaran

Engagement Manager @ HSBC | Driving Data Ingestion Growth

How does it work?

Advantages from a data engineer’s perspective:

Advantages from a data scientist's perspective:

Points to consider:

Image credit:

References & Additional Reading:

all about data

708 位关注者

SujithKumar Chandrasekaran的更多文章

社区洞察

其他会员也浏览了

8 Data Engineering Best Practices for Building a Robust Data Infrastructure

Unlocking Data Value with Medallion Architecture: The Power of Microsoft Fabric in Data Engineering

Learn how Lyftrondata Data Governance provides superior data protection, stability, and dependability for your organization

How Enterprises are Turning To A More Progressive Data Integration Approach: Data Virtualization

Modern Data Architecture in the Cloud: Transforming Data Management for the Digital Era

Modern Data Architecture - Integrated and Scalable Storage Layer

Architecting the Future: Transitioning to a Modern Data Architecture

Delta Lake and Medallion Architecture

Modern Data Architecture Concepts

D3Clarity Blog | Demystifying Data Architecture

How does it work?

Advantages from a data engineer’s perspective:

Advantages from a data scientist's perspective:

Points to consider:

Image credit:

References & Additional Reading:

all about data

708 位关注者

SujithKumar Chandrasekaran的更多文章

GDPR in 3 mins - 1 of 7 Principles

GDPR in 3 mins - Objective & Rights it protects

GDPR in 3 mins - Scope & Definitions

Are you becoming a Chicken ?

Differential data privacy - an Overview

Differential Data privacy - demystified

Model extraction using Active Learning

Data Free Model Extraction Attack

I know what you did last summer

Adversarial attacks on "Explanation models"

社区洞察

其他会员也浏览了

8 Data Engineering Best Practices for Building a Robust Data Infrastructure

Unlocking Data Value with Medallion Architecture: The Power of Microsoft Fabric in Data Engineering

Learn how Lyftrondata Data Governance provides superior data protection, stability, and dependability for your organization

How Enterprises are Turning To A More Progressive Data Integration Approach: Data Virtualization

Modern Data Architecture in the Cloud: Transforming Data Management for the Digital Era

Modern Data Architecture - Integrated and Scalable Storage Layer

Architecting the Future: Transitioning to a Modern Data Architecture

Delta Lake and Medallion Architecture

Modern Data Architecture Concepts

D3Clarity Blog | Demystifying Data Architecture