The Role of Data Lake in a Data Mesh
Credit: https://unsplash.com/photos/ZiQkhI7417A Credit to other pictures used here: https://martinfowler.com/

The Role of Data Lake in a Data Mesh

From my experience, quite a few people still see the Data Lake or Lakehouse as an opposing concept to the Data Mesh. A lot might come from the fact that in one of the earlier publications of the Data Mesh, the monolithic design enforced by most data lake implementations is rightfully depicted as flawed and opposing the Data Mesh approach. But that criticism aims at the chosen topology, not the technology or architecture itself.

Here, I would like to highlight how the potential benefits of a data lake can be critical to a data mesh architecture, and those benefits have never changed. But, contrary, we can see a trend where the role of a Data Lake will become even more critical to a mesh architecture.

Data Increases in Value When Being Shared

The Data Mesh proposes a federated and governed domain-oriented data ownership where data is shared and consumed as products. The consumption through "data sharing" is one of the key aspects here.

Data as a product (1):

Datasets as Product (credit: https://martinfowler.com/articles/data-monolith-to-mesh.html)

Any data sharing activity generally starts with combining data from various fragmented resources in one repository, which a company can then use to complete analyses and establish platforms. A repository can be anything in the range of a data warehouse, data lake, or file system. Data warehouses are repositories of structured data sets. The data has already been selected from different sources, cleaned, and integrated into a predefined structure. Data lakes are repositories of unstructured data combined without the initial cleaning step; a company can structure the data as needed for specific applications.

The data lake as nodes on the mesh (2):

No alt text provided for this image

At this point, it's worth mentioning that the Lakehouse aims to enhance the data lake as a structured data repository, making it unnecessary to have a mix of repositories (at one node) like a data warehouse and a data lake.

It should be clear now that data lakes and data mesh are not opposing concepts and that the data lake can be a critical component of any data mesh implementation. However, we haven't discussed what makes a data lake essential in a data mesh architecture.

The Benefits of a Data Lake

Even before, with a monolithic design, the benefits of a data lake have primarily stayed the same. I would summarize them as follows:

  • Economics of scale:?Data lakes as a repository, especially with the detachment of storage and compute, have demonstrated favorable metrics regarding volume and data access.
  • Wide range of use:?Open formats and APIs expose the data sets stored in a data lake to a wider variety of access patterns and use cases. It's a straightforward fact that a language based upon relational algebra and tuple relational calculus is simple to use and super powerful but also limited in applications.
  • Real-time data availability:?Data lakes help store and utilize live data, which makes them especially valuable in data-sharing areas that require continuous monitoring and fast reaction, such as tracking.

The last point above will likely make data lakes even more critical to future data lakes.

Data Lakes Might Even Matter More

A recent interview with Datanami Matei Zaharia, CTO of Databricks, reveals that they see a double-digit percentage of workloads using streaming. Databricks considers this a trend where enterprises want to build operational applications with their data. It is certainly not every company but what is driving this trend are applications where actions on incoming data are operationalized.?

Under the assumption of such a trend, data lakes will likely play a more significant role in a data mesh architecture.

Ravi Chinta ,

Technical Architect - D & A, Scrum Certified CSM | 14x Databricks I 7xAzure Certified I 6xIBM Certified I 5xSisense Certified I 6xTableau Certified

2 年

Data mesh is coined title to me and the back end concept is similar to Datamart with confirmed dimensions to cater business requirements of different entities in the organization. with added flavor of advanced technology and flexibility to cover major use cases in data landscape. I believe Data strategy is key for Data Mesh success.

回复

It may be worth mentioning that the Data Lake was originally meant to store data from one source only, not to be a monolith where all the data from all the sources are stored. And if we keep that in mind then it is evident that a Data Lake actually fits very well into Data Mesh.

Patrick Pichler

Data, Analytics & AI @ Creative Data | EU Funding | Pluralsight Author

2 年

I wouldn't consider Data Lake as a concept but rather a storage type and an alternative to a relational database, that's why I also don't see any contradictions in this case. However, I do see contradiction when it comes to a data warehouse/lakehouse and a "data mesh". The former concept is centralized and subject-oriented (ensuring consistency across data) whereas the latter one is decentralized and domain-oriented.

要查看或添加评论,请登录

Henning Kropp的更多文章

  • Evaluating the Utility of Data Assets

    Evaluating the Utility of Data Assets

    Data decoupled from specific hardware and software implementations become an independent economic good. While this…

  • Why Consider Capex With Your Cloud Adoption?

    Why Consider Capex With Your Cloud Adoption?

    Clouder providers did not build better data centers or capabilities to operate data centers more efficiently*. Instead,…

    1 条评论
  • There Are No Giant Leaps - Cloud as an Innovation Enabler

    There Are No Giant Leaps - Cloud as an Innovation Enabler

    Neil Armstrong lied when he said his famous words while setting foot on the moon as the first man in history: "That's…

  • Going Up! Small Steps vs. Giant Leaps

    Going Up! Small Steps vs. Giant Leaps

    During a mountain hike, you less stare at the top but rather look down at your feet while you make small and careful…

    2 条评论

社区洞察

其他会员也浏览了