登录查看更多内容

DATA VIRTUALIZATION

Mustafa Qizilbash

‘Open for New Opportunities (Globally), Author, Data & AI Practitioner & CDMP Certified, Innovator of Four 4s Formula, DAC Architecture, PVP Approach

发布日期: 2023年2月6日

Data Virtualization is the virtual consumption layer and connect with anonymous data stores i.e., RDBMS, data appliance, NoSQL, Web Services, SaaS, and enterprise applications etc. As its connecting directly with all data stores, the data models of all systems remain intact with no change.

‘It’s considered as one of the solutions for Data Sovereignty problem statement.’

What data stores?

Here data stores are referred as traditional RDBMS or OLTP systems, OLAP systems, Data Warehouse, Data Lake, Delta Lake, Data Vault, Business Vault, Kafka, data appliance, NoSQL, Web Services, SaaS, and enterprise applications etc. Yes, it’s confusing as we all know above data stores serve their purpose for one or the other business requirement then why this another layer on top?

Data Virtualization hosts views connecting directly with all possible data stores. These views can be one-to-one or can be complex SQL written views to pull data from multiple source systems to create a virtualized layer on top of consumer layer to consume.

Having said that, as many new virtualized data models can be created as second or third or fourth layers to fulfil different business requirements. All these layers can be called virtual Semantic Layer.

In nutshell, all data from either raw from source or processed, all will be available in one environment.

By default, data virtualization tools use host processing power but there are tools which can enable in-memory processing to pull data in-memory for processing. Tools can also restrict like if the query will take more than certain time, then it should abort.

Data Virtualization tools can also notify if changes are made in source systems which can be propagated in virtualization tool automatically. In case, dirty data is found in virtualization layer, unlike any other data store i.e., RDBMS or OLTP systems, OLAP systems, Data Warehouse, Data Lake, Delta Lake, Data Vault, Business Vault etc., it can make change or write back or clean data back in source systems, as well. For example, if data quality issue has been identified, and that data is residing e.g., in 3 source systems and in a data mart then data in all these 4 data stores can directly be fixed from Data Virtualization layer. No need to run different implementation exercises in all the data stores.

Another good example can be like, one has data pipeline coming from through Data Lake à Data Warehouse à Data Marts à OLAP/ Semantic Layer à BI tool. Consider if user want to add one more column in BI Dashboard, one must pipeline the whole process to get the new column which will take weeks or months to deliver to business. Solution is to virtualize source systems, get the column immediate in BI tool and in parallel let IT process the pipeline, once done replace new column with column coming from pipeline in BI tool.

Advantages

●???????Single Access Point: Access all data stores from single environment

●???????Write Back Capability: Fix data directly from data virtualization environment means write back capability into source systems

领英推荐

Data Warehousing 101: Tracing its Evolution to the…

Ankur G. 1 年前

●???????UAM: Use Access Management at one place

●???????Data Quality Management: Data Quality issues can be identified as single point and can be fixed from same tool

●???????No Data Movement: No ETL/ ELT is required

●???????Ease of use: As users don’t have to write queries in different languages. In Data Virtualization, user can write in one language, and it will create wrapper to talk to all the connected systems

●???????No Additional Storage: As it is just a virtual copy of new unified data model and doesn’t cost any separate copy of data, so no additional storage is required

●???????Low Risk: As its not hosting the data copy as compared to Data Warehouses or Data Lake where data is physical moved, so there is no chance of missing any data

●???????Better for Analytics: As all the source data models are converted into one unified data model and data is fetched in real-time, it makes source data available in near real-time for Data Analysts and Data Scientists

Disadvantages

●???????Performance Impact: Some impact on source systems as some queries will run at source

●???????Functional Knowledge: Seek dedicated involvement by Data Stewards and Functional Consultants

●???????No Historical Data: As there is no copy of data hosted so history data is also not maintained. Data Virtualization is best suited along with Data Warehouse in-place

For me, the way Data Virtualization tools are maturing, these will become the sole consumption layer for all the datastores.

Cheers.

要查看或添加评论，请登录

Mustafa Qizilbash的更多文章

Data Products with Challenges

2025年3月6日

Data Products with Challenges

In today’s data-driven landscape, organizations heavily rely on data products to drive insights, improve efficiency…

6 条评论
Common Pitfalls when evaluating and decommissioning data products & How to Avoid

2025年3月3日

Common Pitfalls when evaluating and decommissioning data products & How to Avoid

Even with a structured approach, organizations often encounter challenges when evaluating and decommissioning data…

2 条评论
A Lifecycle Framework for Evaluating and Decommissioning Data?Products

2025年2月26日

A Lifecycle Framework for Evaluating and Decommissioning Data?Products

A structured lifecycle approach ensures efficiency, accountability, and minimal disruption when evaluating and retiring…

2 条评论
Types of Data Products to Decommission

2025年2月24日

Types of Data Products to Decommission

Not all data products remain valuable indefinitely. As businesses evolve, certain data assets become obsolete…
The Need for Evaluating and Decommissioning Data Products

2025年2月24日

The Need for Evaluating and Decommissioning Data Products

1. The Challenge of Data Product Sprawl Organizations tend to accumulate numerous data products over time for several…

4 条评论
Impact & Governance

2025年2月18日

Impact & Governance

As organizations strive to become data-driven, the ability to measure, govern, and optimize data initiatives is…

2 条评论
Decision-Making Context or Data?Story

2025年2月16日

Decision-Making Context or Data?Story

Data, in its raw form, is just a collection of facts. It’s the story we weave around that data that transforms it into…

1 条评论
Data Product Lifecycle & Problem-Solving Focus

2025年2月13日

Data Product Lifecycle & Problem-Solving Focus

In today’s data-driven landscape, the role of a Chief Data Officer (CDO) extends beyond governance and compliance. A…

3 条评论
The Evolving Landscape of Data Practice [a landscape CDO Owns]

2025年2月6日

The Evolving Landscape of Data Practice [a landscape CDO Owns]

In today’s rapidly evolving digital landscape, organizations face increasing challenges in building scalable…

2 条评论
Foundation of Data Practice (Supporting the Four 4s Framework)

2025年2月5日

Foundation of Data Practice (Supporting the Four 4s Framework)

The Four 4s framework provides a structured approach for data teams to define strategy, execute initiatives, and drive…

6 条评论

See all articles

DATA VIRTUALIZATION

Mustafa Qizilbash

‘Open for New Opportunities (Globally), Author, Data & AI Practitioner & CDMP Certified, Innovator of Four 4s Formula, DAC Architecture, PVP Approach

领英推荐

Mustafa Qizilbash的更多文章

社区洞察

其他会员也浏览了

Azure Synapse vs Snowflake

Importance of partitioning in Data-intensive Analytics Solution Design

The Challenges of Virtual Knowledge Graphs

Apache Hudi: The Transactional Data Lake Revolutionizing Big Data Processing

Data Mesh Simplified

Azure Synapse Analytics and Azure Data Factory: Empowering Your Data Lake

A Comprehensive Approach to Designing Data Architectures for Semi-Structured Data

Unlocking Efficiency: How Azure Data Factory Transforms Data Pipeline Automation

Data & Analytics Q3 2019 market update: Heavy M&A activity and >$1.2B investments

Building High-Performance Data Warehouses with Azure SQL Data Warehouse

领英推荐

Mustafa Qizilbash的更多文章

Data Products with Challenges

Common Pitfalls when evaluating and decommissioning data products & How to Avoid

A Lifecycle Framework for Evaluating and Decommissioning Data?Products

Types of Data Products to Decommission

The Need for Evaluating and Decommissioning Data Products

Impact & Governance

Decision-Making Context or Data?Story

Data Product Lifecycle & Problem-Solving Focus

The Evolving Landscape of Data Practice [a landscape CDO Owns]

Foundation of Data Practice (Supporting the Four 4s Framework)

社区洞察

其他会员也浏览了

Azure Synapse vs Snowflake

Importance of partitioning in Data-intensive Analytics Solution Design

The Challenges of Virtual Knowledge Graphs

Apache Hudi: The Transactional Data Lake Revolutionizing Big Data Processing

Data Mesh Simplified

Azure Synapse Analytics and Azure Data Factory: Empowering Your Data Lake

A Comprehensive Approach to Designing Data Architectures for Semi-Structured Data

Unlocking Efficiency: How Azure Data Factory Transforms Data Pipeline Automation

Data & Analytics Q3 2019 market update: Heavy M&A activity and >$1.2B investments

Building High-Performance Data Warehouses with Azure SQL Data Warehouse