Cloud Data Warehousing—imagine a mesh of cloud
Data mesh might well be described—based on its founder’s reasoning and description—as an anti-warehouse. Although firmly positioned within the cloud data warehousing market and aiming to address a large subset of data warehousing needs, many potential buyers may be surprised by the strength of Zhamak Dehghani’s rejection of the key component cloud data warehousing. In a tweet from 2021, she says: “There are no warehouses in a mesh. There are autonomous data product quantums that provide multimodal access to domain data for analytical workloads - connected together in a graph - each both transforming and serving/controlling immutable bitemporal data.”
It's my emphasis there on the first sentence of the tweet. The remainder hits on many of the key characteristics of what data mash proposes. And therein lies a second challenge for many traditional data warehousing practitioners approaching the topic: the characteristics are described in language that is likely to be unfamiliar to them.
At the core of data mesh is the concept of data-as-a-product. Product thinking emphasizes the ownership of the datasets by “teams that most intimately know and consume the data” as described by ThoughtWorks’ 2020 definition of a mesh. The responsibility stretches from operational creation of the data to its eventual provision to analytic workloads. Each functional team builds a pipeline of data, the function to run it, and the metadata to describe it and make it discoverable.
The resulting pattern is, in essence, a plethora of silos without any obvious way to reconcile data between them beyond agreements between the various teams. The approach is based on Eric Evans’ domain-driven design, which has been successful in designing modern operational applications. However, its application in the informational environment is challenging, especially given the lack of a clear definition of what a domain is in this case. Choosing narrowly focused business domains offers easier development of each data product, but leads to an explosion of uncoordinated pipelines. Broadly defined business domains lead, of course, to the opposite problem.
As a result, the architectural design pattern (ADP) for data mesh diverges far from the Generic Foundational Cloud ADP, shown in Post 3 of this series. The data mesh ADP in all its glory is shown next.
Although the original three information pillars (MGD, PMD, and HSI) are shown to indicate that these data/information types remain important for design, data products (the mauve hexagons) in this ADP do not readily lend themselves to advanced thinking about the support needs of different data/information types.
A further challenge emerges when we consider the black, double-line boundary between traditionally delivered operational systems and the informational environment. Data mesh has been conceived as an entirely cloud-based environment built from modern service-oriented components. Micro-services are therefore proposed as the components that deliver operational function (creating transaction and other data) and make it available to the relevant data products. How data from existing on-premises, monolithic applications is not immediately obvious.
These very challenging considerations require much more exploration than can be provided in this short post. Check out “Cloud Data Warehousing—Volume II: Implementing Data Warehouse, Lakehouse, Mesh, and Fabric” for an in-depth discussion. And, of course, to understand the strengths of this pattern as it tries to avoid and navigate the problems that arise from the more centralized approaches—both technically and organizationally—found in traditional data warehouse implementations.
Technical Marketing, Independent Consultant, DBA
8 个月This book is for anyone in the data #analytics business: #database #AI #ETL #datapipeline #BI #BITools If you think you understand data and the future of data, this book will break a lot of brain cells and help you see 5 years out from now. BTW -- this is the guy who invented the entire analytic architecture decades ago. A must read...