登录查看更多内容

Data Architecture Patterns

Rudraksh Bhawalkar

Partner @ EY Tech Consulting || Cloud & Data Modernization || Sustainability || Responsible AI

发布日期: 2024年5月5日

As Kardashev Scale provides the categorization of civilization maturity and technological advancements based on the energy it consumes, for e.g. –

Type 0 to 1 – where our Earth is currently (at 0.72 to be exact) based on its energy requirements
Type 1 Civilization is able to access all the energy available on the planet
Type 2 Civilization can directly consume its star’s energy through Dyson’s sphere
Type 3 Civilization is able to capture all the energy emitted by its Galaxy and every object within it

Similarly, I foresee different level of architectures at different levels of maturity and advancements for an organization as mentioned below –

Type 0 – Simple RDBMS on a single machine serving both Operational and Analytical requirements
Type 1 – Enterprise Data Warehouse supported simple ETL/ELT pipelines, Lambda and Kappa Architecture
Type 2 – Data Lakehouse supported with Medallion Architecture
Type 3 – Data Product as a Service supported through Datamesh way of architecture

Now, lets dig deeper into understanding what these architecture means and works.

Type 0 – A small and a simple RDBMS on a single machine where you have both On Line Transactional Processing and a partitioned space for views/stored procedures/copied tables is created to allow reporting and dashboarding on top.

Benefits – good for small scale data analytics with rudimentary analysis

Problems – performance issues, manual data mapping and merging, data quality issues and not scalable

Type 1 – Data Warehouse architecture became famous in 90s, 2000s, 2010s and are still going strong. There have been extended versions with data lake implementation as well through Lambda and Kappa Architecture. This is a controversial topic, but to keep it simple, I would call them as an integral part of modern Data Warehouse solutions.

In the ETL approach, we talk about extracting data from source systems, transforming it on the way with business requirements and load it in the tables ready to be consumed.

Benefits – Centralized approach, standardized, and user friendly

Problems – Batch oriented and slow, not scalable, and rigid

In the ELT approach, the data is extracted from the Applications directly but is only loaded into the Data Warehouse with out transformation. The transformation logic is executed within the DWH itself.

Benefits – Flexible in implementation and improved scalability

Problems – Data orchestration challenges

In the Lambda Architecture, we extend the data warehouse with modern platforms capability to have batch data processing coupled with realtime streaming data processing. Here, both the data pipelines stores the data in parallel, batch data pipeline stores the data in a data warehouse and realtime data streams stores the data in a distributed data store or a blob.

领英推荐

Data Lakehouse Architecture: A Modern Solution for…

Andrew Madson MSc, MBA 8 个月前

ELEMENTS OF DATA ARCHITECTURE

Bill Inmon 3 个月前

Data Vault – A Modern Architecture Enterprises

Lyftrondata 6 个月前

Benefits – increased Flexibility and Scalability

Problems – High Complexity and Architecture Lock in as Data Identification & Migration becomes a challenge

In the Kappa Architecture, which is an extension of the Lambda Architecture, we have only one data processing stream instead of 2 in contrast to the previous one. Here, both, batch as well as realtime data streams are treated as one and as realtime ??.

Benefits – Simple and streamlined pipelines, ease of data identifications and migration, tiered storage

Problems – Complexity is high, and high TCO

Type 2 - A Data Lakehouse is the next generation architecture resulting through the convergence of the data lake and data warehouse. This open system uses similar data warehouse managing features in conjunction with low-cost storage used for data lakes. The Medallion Architecture is implemented to logically organize data in multiple layers within a Data Lakehouse, namely – Bronze (Raw Landing Area), Silver (Cleaned and Augmented Data) and Gold (Aggregated and Business Ready Data).

Benefits – Improved Data Governance, Data Lineage, High Scalability, and Flexibility

Problems – Still a New Concept, Increased Complexity, and May end up in Data Duplication

Type 3 - Data mesh is an architectural pattern for implementing enterprise data platforms in large and complex organizations. The benefits of a data mesh approach are achieved by implementing multi-disciplinary teams that publish and consume data products. The main 4 pillars of a data mesh are –

Data Domain – the data in this case is owned by the team that understand it and need it the most, for e.g. Customer Data Product
Data as a Product – build high quality and reusable data solutions like Customer Data Product that can be shared and consumed with Marketing, Sales, Accounting...etc teams but managed by one team as a product
Self Service Data Platforms – have Modern Data Platforms to enable fast and frictionless consumption of the data and provide right information at the right time to the right user to enable fast value realization
Federated Governance – implementation of standards, policies, and controls becomes fast without complexities as team owning the data and data products operates in a federated manner thus reduction of friction and less time is consumed

Benefits – Fast Time to Market, Clear Ownership, High Data Quality, Better Data Governance

Problems – A New Concept, Change Management, Cultural Change, no single platform to implement holistic Data Mesh

In conclusion, as the data landscape evolves the modern data platform and architecture landscape will evolve too. All these architecture patterns have one thing in common, providing the value to the business.

And, as we mature in terms of data requirements, use case definitions and generating value for our clients and end consumers, we will keep seeing more and more simple yet complex architectures. As of now, I recommend the Data Mesh Architecture Strategy to realize right Value of Data, implement strong Data Governance, high Quality and reliable Traceability of Data and, in the end, have AI readiness.

Hope with high data usage to satiate hunger of GenAI and AI applications, we will end up consuming more energy and one day achieve Type 1 Civilization Status as per Kardashev’s Scale ??!

Mayank Srivastava

Global Director Data & Analytics @ FrieslandCampina | Data, AI, Enterprise Integration

10 个月

Interesting topic for a coffee chat soon ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Rudraksh Bhawalkar的更多文章

AI Ready Data Foundation powered by Modern Data Platform

2025年2月25日

AI Ready Data Foundation powered by Modern Data Platform

The rapid adoption of artificial intelligence (AI) and generative AI (GenAI) has brought several challenges to the…

3 条评论
Its a bird! Its a plane! No, its Super AI - Agentic AI!!!

2024年10月27日

Its a bird! Its a plane! No, its Super AI - Agentic AI!!!

As said by Andrew Ng, “AI will be able to do everything a human can – may be even better”, is proving true day by day…

2 条评论
AI Ready Data

2023年12月16日

AI Ready Data

With the advent of generative AI and large language models, the importance of data is at its highest point. There are…

4 条评论
What does Data Strategy mean for Generative AI!

2023年8月18日

What does Data Strategy mean for Generative AI!

What does #DataStrategy mean for #GenerativeAI!!! Almost all have heard the iconic phrase, “Help me help you!” from the…

14 条评论
Why do we need Regulations on AI!

2022年5月9日

Why do we need Regulations on AI!

Regulations in AI is the need of the Hour! We live in the world in which AI is ever-present, and that passively…

4 条评论
Responsible Business through Responsible AI and Sustainability

2022年3月1日

Responsible Business through Responsible AI and Sustainability

“You cant escape the responsibility of tomorrow by evading it today”; these are the words of Abraham Lincoln and they…
Rise of Responsible AI

2021年2月3日

Rise of Responsible AI

“Ethics is knowing the difference between what you have a right to do and what is right to do”, these are the words of…

1 条评论
Opportunities for fast tracking Innovation due to COVID19 Pandemic

2020年4月13日

Opportunities for fast tracking Innovation due to COVID19 Pandemic

As someone rightly said; “Innovation is the mother of disruption”. But, in the current global situation we have the…

2 条评论
AI for BI - the new age of Business Intelligence

2020年2月11日

AI for BI - the new age of Business Intelligence

“Some people call this artificial intelligence, but the reality is this technology will enhance us. So instead of…
RPA - Empowering Data and Analytics

2019年9月25日

RPA - Empowering Data and Analytics

As famously said by Bill Gates (no need of introduction here ??) that, “The first rule of any technology used in a…

See all articles

Data Architecture Patterns

Rudraksh Bhawalkar

Partner @ EY Tech Consulting || Cloud & Data Modernization || Sustainability || Responsible AI

领英推荐

Rudraksh Bhawalkar的更多文章

社区洞察

其他会员也浏览了

DATA ARCHITECTURE: A BRIEF HISTORY

What is Data Pipeline Architecture?

Big Data Architecture

Data Warehouse vs Lake vs Lakehouse vs Mesh vs Fabric

Modern Data Architecture

Data Virtualization 2.0: ETL’s doppelg?nger rising again?

Data Vault Architecture, Data Quality Challenges, And How To Solve Them

Architectural Patterns in Data Engineering Projects

How to develop solid Data Architecture

Modern Data Architecture: An Overview of Lambda and Kappa Architectures

领英推荐

Rudraksh Bhawalkar的更多文章

AI Ready Data Foundation powered by Modern Data Platform

Its a bird! Its a plane! No, its Super AI - Agentic AI!!!

AI Ready Data

What does Data Strategy mean for Generative AI!

Why do we need Regulations on AI!

Responsible Business through Responsible AI and Sustainability

Rise of Responsible AI

Opportunities for fast tracking Innovation due to COVID19 Pandemic

AI for BI - the new age of Business Intelligence

RPA - Empowering Data and Analytics

社区洞察

其他会员也浏览了

DATA ARCHITECTURE: A BRIEF HISTORY

What is Data Pipeline Architecture?

Big Data Architecture

Data Warehouse vs Lake vs Lakehouse vs Mesh vs Fabric

Modern Data Architecture

Data Virtualization 2.0: ETL’s doppelg?nger rising again?

Data Vault Architecture, Data Quality Challenges, And How To Solve Them

Architectural Patterns in Data Engineering Projects

How to develop solid Data Architecture

Modern Data Architecture: An Overview of Lambda and Kappa Architectures