Evolving Data Architecture Patterns – Data Fabric & Data Mesh

When the hype about Big Data and 3V’s of data started, most organizations went on to collect data from across the enterprise, trying to establish the Data lake. Research Firms and Technologists convinced the world that data is living in silos and organizations need to collect all data in a Data Lake as single source of truth. Many orgs failed in that pursuit due to several reasons such as lack of skilled resources, new and broad technology landscape, lack of data governance & policies etc. Even if some succeeded in establishing the Data Lake, most failed to tap into it and derive data insights. In other words, the data journey turned out to be an expensive affair due to heavy investment in software licenses, hardware and building teams of wide skillsets. Justification given now is that because so much of data from different domains in a single place, it’s very hard to tap into it and so we should go for federated Data lake. Funniest I have heard is "Data Lake is not keeping it's promise ". But doesn’t this approach take us back to data silos? Sound like a familiar paradox????????????????????????????????????????????

So, what went wrong especially in cases, where in Data Lake was successfully established by extracting data from the source systems? Why exactly orgs couldn’t tap into the data? Main reason is Lack of Data Democratization, which enables end users to seamlessly access the data they want to without being dependent on data engineers. Other reasons that vary with implementations approaches are data quality, data governance, data integrity etc. In other words, many orgs failed or ignored to control the data while busy pushing it into the Data lake. So, what’s the Solution to these problems?

Solution needs to address and comprise of multiple aspects: Data Democratization, Automation, Data Governance, Data as Service/Product etc.?The idea is to make the quality data available to all on demand by eliminating or minimizing the dependencies on IT & data engineering teams.

Following two methodologies and architecture approaches are gaining popularity to address this:

1.???Data Mesh: Data Mesh architecture is based on Domain Driven Design and aims at delivering Data as a Product (DAAP). Idea is to give the ownership and onus to the domain teams to build & govern the data products and expose the service to serve the data product to other domains, a concept called Data as a Service (DAAS). All data doesn’t need to sit in a single Data lake but consists of its own set of data stores such as object storage, DB, DW, Data Lake etc. In other words, Data Mesh relies on concept of Federated Data. Rather than looking at enterprise data as one huge data repository, data mesh considers it as a set of repositories of data products. Following Diagram below Data Mesh Logical Architecture:

No alt text provided for this image

2.???Data Fabric: Data Fabric pattern emphasizes on building a knowledge graph of metadata that holds relationships between data sources. Machine Learning and upcoming technologies such as semantic knowledge graphs and active metadata management are aimed at facilitating data fabric architecture. Also, Data Fabric pattern relies on Data Virtualization, a concept that doesn’t require ingestion of data beforehand but access it via metadata store dynamically with clever techniques like caching and push down query optimization. Examples of Data Fabric Tools are: DataFlex, Atlan, Cinchy, data.world, Denode, K2View, IBM Cloud Pak etc. Following Diagram below Data Fabric Architecture:

No alt text provided for this image

Question is – Are Data Mesh, Data Fabric, Data Lake replacement of each other? Not Really. What’s the guarantee that having Decentralized domain wise governance as dictated by Data Mesh won’t pose other type of challenges? For ex - it might need separate teams of data scientists and data engineers in each domain/BU, adding to additional cost. What’s the guarantee that having Data Virtualization approach to access the data without having to move the data but on demand won’t pose any performance or other challenges? What’s the guarantee that ML based automated knowledge graph as dictated by Data Fabric will ensure Data Integrity? It is yet to be seen whether these new data management approaches add business value or complexity. Choice of data architecture should depend upon number of factors within an organization such as, number of domains or business units, volumes, types & sensitivity of data, workload types – analytical or transactional, org policies, SLAs, Industry regulations, Use cases, current Technology landscape etc. For example – Use cases like Customer 360 might need data from different domains and yet need curation/aggregation at many levels, then it makes sense to establish the DataLake. You can also have segregated domain wise buckets within the DataLake governed by respective domains and have domain wise micro services to access the data.??OR you can have hybrid of all, encompassing all data architecture patterns. Haven’t we been doing this for decades – having multiple design patterns in the Solution Architecture aimed to solve specific business problems and use cases??In summary, Data Mesh is more of organizational change than architecture change and Data Fabric is more of architecture change at the core.?Each has its own pros and cons and pose new challenges. One needs to do due diligence for assessing the fitment of data architecture patterns in his or her organization before adopting the Data Architecture approach or pattern. For example: a weighted scoring based on selection parameters. Be judicious in choosing the Data Architecture pattern(s) or data can become an untamed BEAST.

Paul M.

Director, Supply Chain Operations Consulting

3 年

Good article Abhishek. Many of the "data initiative" failures are indeed associated with data quality, data governance, and data integrity. However, these are all “systemic” components and with guidance, can be aligned and implemented correctly from day 1.? With the support of a strategic data team (vendor-agnostic) that includes all the relevant skills and capabilities a simple, yet effective stepwise approach could make a significant and meaningful impact on the expected outcome. It further reveals the emphasis required on architecture, or organisational change and to what extent data needs to be democratised and personalised. Five high-level steps to follow:? 1) Discovery - due diligence to determine what, why when, where, who, and how? 2) Data preparation 3) Data unification and curation? 4) Delivery 5) Data consumption – aligned with the business objectives, digital strategy, and action plans defined during discovery

回复
Rekha Chandrashekar

Enterprise Architect|Strategy & Data Leadership|Tech Innovation

3 年

Great insights Abhishek Mittal!

回复
Vineet Kumar

MLOps l Data Engineering Consultant l Data Architect l Technology Architect l Cloud Platform l AWS l Patent Owner l Hortonworks/Oracle/AWS/SnowFlake Certified

3 年

Nice one

回复

要查看或添加评论,请登录

Abhishek Mittal的更多文章

  • LLMOps

    LLMOps

    As organizations seek to leverage the power of LLMs in production environments, the need for efficient and scalable…

    1 条评论
  • How to Evaluate Large Language Models (LLMs)

    How to Evaluate Large Language Models (LLMs)

    Large Language Models (LLMs) like GPT, Falcon, Gemini, BERT, Dolly etc have revolutionized the field of natural…

  • Retrieval-Augmented Generation (RAG) Techniques

    Retrieval-Augmented Generation (RAG) Techniques

    In the evolving field of artificial intelligence, the Retrieval-Augmented Generation (RAG) framework has emerged as a…

  • Rise of the LakeHouse Architecture

    Rise of the LakeHouse Architecture

    Modern Data Platforms have come a long way in trying to create a feasible Data Architecture. Initially it started with…

  • DataOps

    DataOps

    DataOps is an approach to data management that aims to combine agile methodologies, automation, and collaboration…

  • Data Quality & Data Observability

    Data Quality & Data Observability

    As the Data Lake grows in volumes, it poses significant challenges for data quality, as Data Lake often lack the…

  • Big Data calls for better Metadata Management: Knowledge Graphs

    Big Data calls for better Metadata Management: Knowledge Graphs

    All Datizens have reached to the consensus that management & administration of Big Data Lakes demand definition of…

    2 条评论
  • Data 360

    Data 360

    Hi there! In the continuation of my Data Series, here I am again to wrap up the series with my masterpiece -Data/Info…

    3 条评论
  • The Rise of Data Stores & the Role of K8

    The Rise of Data Stores & the Role of K8

    As per the prominent Digital Strategy by many research firms -SMAC (social, mobile, analytics and cloud), many…

    1 条评论
  • CRAZY Big Data

    CRAZY Big Data

    Hi Datizens! Here I am again to throw a Point of View on Big Data Architecture & Essentials. Now that we have learnt it…

社区洞察

其他会员也浏览了