登录查看更多内容

Cloud Data Warehousing—Architectural Design Patterns

Barry Devlin

Data Architect, BI and DW Analyst and Consultant, Author and Speaker

发布日期: 2023年7月18日

I imagine you’ve seen many IT diagrams masquerading as architectures! I use a simple rule: if there’s a product or open-source component name or logo showing, it’s not an architecture. I used to call such pictures “boxologies”! Which sounds a bit rude…

All the figures in the previous four posts are architectures. The first data warehouse diagram I drew way back in 1988 was also an architecture. However, with the emergence of the many flavors of data warehousing in the past decade, I find that “pure” architecture is now rare.

Many diagrams use the same terms to describe subtly or distinctly different things. In other cases, the same solution is named as two different things. Indeed, some solutions are described as a combination of different things, such as, a vendor may suggest that their offering applies “data mesh concepts” on top of a “data fabric base” to deliver a “data lakehouse solution.” I exaggerate only a little. But what could this possibly mean?

Maybe I personally need to be a little more flexible! But I implore vendors and consultants to be more strict. I’ve started to talk about architectural design patterns (ADPs) to allow more space for variations on a theme and for concepts to develop and evolve. An ADP offers a set of terminology and, usually, a picture, on which we can agree, at least throughout an ongoing discussion. It encapsulates the key business needs and fundamental infrastructure requirements and constraints of a particular solution approach.

In “Cloud Data Warehousing—Volume I: Architecting Data Warehouse, Lakehouse, Mesh, and Fabric” (available here), I describe six architectural design patterns: three are foundational and three are emergent and evolving even as I write. The following is adapted and abridged from that book.

Data warehouse classic (DWC): provides correct and consistent, well-modeled, schema-on-write, relevant, and usable—as far as possible—information in support of business analysis and decision-making needs in a cross-business manner. A DWC may be structured as a hub-and-spoke pattern, a dimensional / star-schema pattern, or some combination of both.

On-premises and cloud versions of this ADP exist because technology can and does drive important differences at the physical implementation level. DWC/op is implemented with “on-premises” technologies based on finite servers or server clusters, using “conventional” relational database technologies. DWC/cn is built on “cloud-native”?technology, including automatically elastic and scalable features, object storage, and separate compute and storage, with multi-cluster compute.

Logical data warehouse (LDW): extends a DWC with direct, real-time access to data in other sources, such as operational systems, files, NoSQL stores, etc. Access is mediated through an overarching logical data model describing the different data sources in a common language. Businesspeople access all the data through data virtualization?technology.

领英推荐

SNOWFLAKE ARCHITECTURE

Rocky Bhatia 2 年前

How To Incorporate Big Data Platforms Into An Existing…

Vintage Global 5 个月前

Data Warehouse Modernization - Part 1: An Introduction

BBI 1 年前

Data lake classic (DLC): offers data in raw, as-received format, or with limited preprocessing and cleansing at the discretion of the business user. Key characteristics include scalable data storage in any format, multiple processing models, and timely, flexible usage (schema-on-read). In many cases, data governance is limited, with users left to their own devices to figure out which data to use when.

The three emergent ADPs are:

Data lakehouse: proposes an elastic cloud solution to a combination of DWC and DLC needs, despite their clearly conflicting nature. It offers an environment based on an object store as a single well-governed storage layer for all structured and semi-structured data, managed and accessed through “relational-like” function, with some technical metadata support. In addition, loosely structured (so-called “unstructured”) data is included, as found in DLC.

The data lakehouse ADC differs only marginally in semantics and initial focus from the DWC/cn pattern defined above.

Data fabric: essentially an extension of the LDW pattern, offers enhanced management and automation of data storage, population, access, and all aspects of data management in a diverse, distributed environment usually centered on a DWC in either of its flavors. ?This is supported via AI-enhanced and -extended active metadata?that reflects the real, changing, live business and computing environment across the entire set of data stores and processes.

Data mesh: proposes a highly distributed, analytics-focused environment that shuns conventional approaches to centralizing data in warehouses or lakes (for flexibility and agility in development and delivery), and instead promotes domain-driven design?to deliver data as a product. Such data products are realized and managed by combined business/IT teams within business domains, with a focus on embedded, distributed governance and infrastructure-as-a-platform.

With this, I conclude this series based on “Cloud Data Warehousing—Volume I”.

It’s time I started writing the next book: “Cloud Data Warehousing—Volume II: Implementing Data Warehouse, Lakehouse, Mesh, and Fabric.” As you can guess, the majority of the book will be devoted to diving deeper into the three emergent ADPs. I’m hoping to publish it early in the new year.

Matthias Mohler

MBA | Head of Data & AI Consulting at Swisscom | University Lecturer | Consulting Leader & Senior Advisor

1 年

Well said. I also like the term ?marchitecture“ for the mentioned types of diagrams that we see each and every day ??

查看更多评论

要查看或添加评论，请登录

Barry Devlin的更多文章

Cloud Data Warehousing—Ware in the Cloud is best?

2024年8月19日

Cloud Data Warehousing—Ware in the Cloud is best?

My goal in writing “Cloud Data Warehousing—Volume II: Implementing Data Warehouse, Lakehouse, Mesh, and Fabric” and…
Cloud Data Warehousing—imagine a mesh of cloud

2024年7月24日

Cloud Data Warehousing—imagine a mesh of cloud

Data mesh might well be described—based on its founder’s reasoning and description—as an anti-warehouse. Although…

2 条评论
Cloud Data Warehousing—the sunny skein of (data) fabric

2024年7月9日

Cloud Data Warehousing—the sunny skein of (data) fabric

Of the four cloud data warehousing solutions or architectural design patterns (ADPs), data fabric stands apart as the…
Cloud Data Warehousing—a mist upon the lake(house)

2024年6月18日

Cloud Data Warehousing—a mist upon the lake(house)

When I first encountered the data lakehouse four years ago, I was fairly negative in my LinkedIn article of February…

2 条评论
Cloud Data Warehousing—a blue skies ADP

2024年6月5日

Cloud Data Warehousing—a blue skies ADP

For many years, I have talked about architecture at three levels: conceptual, logical, and physical. A conceptual…

10 条评论
Cloud Data Warehousing—Seeing Patterns in the Cloud

2024年5月24日

Cloud Data Warehousing—Seeing Patterns in the Cloud

There’s an old adage. You wait ages for a bus and then four come along at once.

4 条评论
Cloud Data Warehousing Vol II—No more foggy thinking

2024年5月17日

Cloud Data Warehousing Vol II—No more foggy thinking

Cloud. Data.

9 条评论
Cloud Data Warehousing—So What Is New?

2023年7月4日

Cloud Data Warehousing—So What Is New?

I guess you’ve got the message by now! There are lots of aspects of cloud data warehousing that carry over directly and…
Cloud Data is Just Data (in the Cloud)

2023年6月14日

Cloud Data is Just Data (in the Cloud)

One key message from “Cloud Data Warehousing—Volume I: Architecting Data Warehouse, Lakehouse, Mesh, and Fabric”…

3 条评论
Cloud Data Warehousing—What’s Not New?!

2023年6月2日

Cloud Data Warehousing—What’s Not New?!

So, the book. It’s new.

6 条评论

See all articles

Cloud Data Warehousing—Architectural Design Patterns

Barry Devlin

Data Architect, BI and DW Analyst and Consultant, Author and Speaker

领英推荐

Barry Devlin的更多文章

社区洞察

其他会员也浏览了

Data Warehouse Modernization - Part 2: Architectures

7 Best Practices in Data Architecture

Modern Data Architecture: A Comprehensive Analysis of Lake, Lakehouse, and Beyond

Application Architect - Cloud Data Modernization for our premier client in Dallas, Texas. Must be on our W2

Application Architect - Cloud Data Modernization - Dallas, TX - onsite

Snowflake Architecture

Data Warehousing & Data Analytics

Data warehouse

Significance of Data Architecture: Uncovering best practices to follow for data success

How eCommerce sites harvest big data across multiple clouds

领英推荐

Barry Devlin的更多文章

Cloud Data Warehousing—Ware in the Cloud is best?

Cloud Data Warehousing—imagine a mesh of cloud

Cloud Data Warehousing—the sunny skein of (data) fabric

Cloud Data Warehousing—a mist upon the lake(house)

Cloud Data Warehousing—a blue skies ADP

Cloud Data Warehousing—Seeing Patterns in the Cloud

Cloud Data Warehousing Vol II—No more foggy thinking

Cloud Data Warehousing—So What Is New?

Cloud Data is Just Data (in the Cloud)

Cloud Data Warehousing—What’s Not New?!

社区洞察

其他会员也浏览了

Data Warehouse Modernization - Part 2: Architectures

7 Best Practices in Data Architecture

Modern Data Architecture: A Comprehensive Analysis of Lake, Lakehouse, and Beyond

Application Architect - Cloud Data Modernization for our premier client in Dallas, Texas. Must be on our W2

Application Architect - Cloud Data Modernization - Dallas, TX - onsite

Snowflake Architecture

Data Warehousing & Data Analytics

Data warehouse

Significance of Data Architecture: Uncovering best practices to follow for data success

How eCommerce sites harvest big data across multiple clouds