登录查看更多内容

Evolving Data Architecture Patterns – Data Fabric & Data Mesh

Abhishek Mittal

Sr. Director - Big Data, Analytics & Cloud

发布日期: 2021年8月16日

When the hype about Big Data and 3V’s of data started, most organizations went on to collect data from across the enterprise, trying to establish the Data lake. Research Firms and Technologists convinced the world that data is living in silos and organizations need to collect all data in a Data Lake as single source of truth. Many orgs failed in that pursuit due to several reasons such as lack of skilled resources, new and broad technology landscape, lack of data governance & policies etc. Even if some succeeded in establishing the Data Lake, most failed to tap into it and derive data insights. In other words, the data journey turned out to be an expensive affair due to heavy investment in software licenses, hardware and building teams of wide skillsets. Justification given now is that because so much of data from different domains in a single place, it’s very hard to tap into it and so we should go for federated Data lake. Funniest I have heard is "Data Lake is not keeping it's promise ". But doesn’t this approach take us back to data silos? Sound like a familiar paradox????????????????????????????????????????????

So, what went wrong especially in cases, where in Data Lake was successfully established by extracting data from the source systems? Why exactly orgs couldn’t tap into the data? Main reason is Lack of Data Democratization, which enables end users to seamlessly access the data they want to without being dependent on data engineers. Other reasons that vary with implementations approaches are data quality, data governance, data integrity etc. In other words, many orgs failed or ignored to control the data while busy pushing it into the Data lake. So, what’s the Solution to these problems?

Solution needs to address and comprise of multiple aspects: Data Democratization, Automation, Data Governance, Data as Service/Product etc.?The idea is to make the quality data available to all on demand by eliminating or minimizing the dependencies on IT & data engineering teams.

Following two methodologies and architecture approaches are gaining popularity to address this:

1.???Data Mesh: Data Mesh architecture is based on Domain Driven Design and aims at delivering Data as a Product (DAAP). Idea is to give the ownership and onus to the domain teams to build & govern the data products and expose the service to serve the data product to other domains, a concept called Data as a Service (DAAS). All data doesn’t need to sit in a single Data lake but consists of its own set of data stores such as object storage, DB, DW, Data Lake etc. In other words, Data Mesh relies on concept of Federated Data. Rather than looking at enterprise data as one huge data repository, data mesh considers it as a set of repositories of data products. Following Diagram below Data Mesh Logical Architecture:

领英推荐

Revolutionizing Data Engineering: The Power of Data…

Steven Murhula 1 个月前

Modern Data Architecture

Emad Yowakim 3 年前

Data Architecture: The Forever Quest for Data…

Val Goldine 1 年前

2.???Data Fabric: Data Fabric pattern emphasizes on building a knowledge graph of metadata that holds relationships between data sources. Machine Learning and upcoming technologies such as semantic knowledge graphs and active metadata management are aimed at facilitating data fabric architecture. Also, Data Fabric pattern relies on Data Virtualization, a concept that doesn’t require ingestion of data beforehand but access it via metadata store dynamically with clever techniques like caching and push down query optimization. Examples of Data Fabric Tools are: DataFlex, Atlan, Cinchy, data.world, Denode, K2View, IBM Cloud Pak etc. Following Diagram below Data Fabric Architecture:

Question is – Are Data Mesh, Data Fabric, Data Lake replacement of each other? Not Really. What’s the guarantee that having Decentralized domain wise governance as dictated by Data Mesh won’t pose other type of challenges? For ex - it might need separate teams of data scientists and data engineers in each domain/BU, adding to additional cost. What’s the guarantee that having Data Virtualization approach to access the data without having to move the data but on demand won’t pose any performance or other challenges? What’s the guarantee that ML based automated knowledge graph as dictated by Data Fabric will ensure Data Integrity? It is yet to be seen whether these new data management approaches add business value or complexity. Choice of data architecture should depend upon number of factors within an organization such as, number of domains or business units, volumes, types & sensitivity of data, workload types – analytical or transactional, org policies, SLAs, Industry regulations, Use cases, current Technology landscape etc. For example – Use cases like Customer 360 might need data from different domains and yet need curation/aggregation at many levels, then it makes sense to establish the DataLake. You can also have segregated domain wise buckets within the DataLake governed by respective domains and have domain wise micro services to access the data.??OR you can have hybrid of all, encompassing all data architecture patterns. Haven’t we been doing this for decades – having multiple design patterns in the Solution Architecture aimed to solve specific business problems and use cases??In summary, Data Mesh is more of organizational change than architecture change and Data Fabric is more of architecture change at the core.?Each has its own pros and cons and pose new challenges. One needs to do due diligence for assessing the fitment of data architecture patterns in his or her organization before adopting the Data Architecture approach or pattern. For example: a weighted scoring based on selection parameters. Be judicious in choosing the Data Architecture pattern(s) or data can become an untamed BEAST.

Paul M.

Director, Supply Chain Operations Consulting

3 年

Good article Abhishek. Many of the "data initiative" failures are indeed associated with data quality, data governance, and data integrity. However, these are all “systemic” components and with guidance, can be aligned and implemented correctly from day 1.? With the support of a strategic data team (vendor-agnostic) that includes all the relevant skills and capabilities a simple, yet effective stepwise approach could make a significant and meaningful impact on the expected outcome. It further reveals the emphasis required on architecture, or organisational change and to what extent data needs to be democratised and personalised. Five high-level steps to follow:? 1) Discovery - due diligence to determine what, why when, where, who, and how? 2) Data preparation 3) Data unification and curation? 4) Delivery 5) Data consumption – aligned with the business objectives, digital strategy, and action plans defined during discovery

Rekha Chandrashekar

Enterprise Architect|Strategy & Data Leadership|Tech Innovation

3 年

Great insights Abhishek Mittal!

Vineet Kumar

MLOps l Data Engineering Consultant l Data Architect l Technology Architect l Cloud Platform l AWS l Patent Owner l Hortonworks/Oracle/AWS/SnowFlake Certified

3 年

Nice one

查看更多评论

要查看或添加评论，请登录

Abhishek Mittal的更多文章

LLMOps

2024年9月22日

LLMOps

As organizations seek to leverage the power of LLMs in production environments, the need for efficient and scalable…

1 条评论
How to Evaluate Large Language Models (LLMs)

2024年9月22日

How to Evaluate Large Language Models (LLMs)

Large Language Models (LLMs) like GPT, Falcon, Gemini, BERT, Dolly etc have revolutionized the field of natural…
Retrieval-Augmented Generation (RAG) Techniques

2024年9月22日

Retrieval-Augmented Generation (RAG) Techniques

In the evolving field of artificial intelligence, the Retrieval-Augmented Generation (RAG) framework has emerged as a…
Rise of the LakeHouse Architecture

2023年10月28日

Rise of the LakeHouse Architecture

Modern Data Platforms have come a long way in trying to create a feasible Data Architecture. Initially it started with…
DataOps

2023年10月28日

DataOps

DataOps is an approach to data management that aims to combine agile methodologies, automation, and collaboration…
Data Quality & Data Observability

2023年10月28日

Data Quality & Data Observability

As the Data Lake grows in volumes, it poses significant challenges for data quality, as Data Lake often lack the…
Big Data calls for better Metadata Management: Knowledge Graphs

2021年9月8日

Big Data calls for better Metadata Management: Knowledge Graphs

All Datizens have reached to the consensus that management & administration of Big Data Lakes demand definition of…

2 条评论
Data 360

2021年9月4日

Data 360

Hi there! In the continuation of my Data Series, here I am again to wrap up the series with my masterpiece -Data/Info…

3 条评论
The Rise of Data Stores & the Role of K8

2021年9月3日

The Rise of Data Stores & the Role of K8

As per the prominent Digital Strategy by many research firms -SMAC (social, mobile, analytics and cloud), many…

1 条评论
CRAZY Big Data

2021年8月30日

CRAZY Big Data

Hi Datizens! Here I am again to throw a Point of View on Big Data Architecture & Essentials. Now that we have learnt it…

See all articles

Evolving Data Architecture Patterns – Data Fabric & Data Mesh

Abhishek Mittal

Sr. Director - Big Data, Analytics & Cloud

领英推荐

Abhishek Mittal的更多文章

社区洞察

其他会员也浏览了

Data Architecture: The Forever Quest for Data Perfection

Data Mesh vs. Data Fabric: How Are They Different and Which Should You Use?

Why Big Companies Need Data Mesh Architecture to Build Effective Data Products

Blending the Kimball Model with Data Lakes: A Modern Data Architecture Approach

Data Lakehouse: The Best of Both Worlds or Just Hype?

Designing a Modern Data Vault 2.0 Architecture

Medallion Architecture: A Modern Approach to Data Management

Data Mesh: Transition to Next Generation of Data Architecture

Enablement of data domain strategy & adoption of Data Mesh Architecture is the way forward for many GSIB's

领英推荐

Abhishek Mittal的更多文章

LLMOps

How to Evaluate Large Language Models (LLMs)

Retrieval-Augmented Generation (RAG) Techniques

Rise of the LakeHouse Architecture

DataOps

Data Quality & Data Observability

Big Data calls for better Metadata Management: Knowledge Graphs

Data 360

The Rise of Data Stores & the Role of K8

CRAZY Big Data

社区洞察

其他会员也浏览了

Data Architecture: The Forever Quest for Data Perfection

Data Mesh vs. Data Fabric: How Are They Different and Which Should You Use?

Why Big Companies Need Data Mesh Architecture to Build Effective Data Products

Blending the Kimball Model with Data Lakes: A Modern Data Architecture Approach

Data Lakehouse: The Best of Both Worlds or Just Hype?

Designing a Modern Data Vault 2.0 Architecture

Medallion Architecture: A Modern Approach to Data Management

Data Mesh: Transition to Next Generation of Data Architecture

Enablement of data domain strategy & adoption of Data Mesh Architecture is the way forward for many GSIB's