Data Architecture Patterns
Rudraksh Bhawalkar

Data Architecture Patterns

As Kardashev Scale provides the categorization of civilization maturity and technological advancements based on the energy it consumes, for e.g. –

  1. Type 0 to 1 – where our Earth is currently (at 0.72 to be exact) based on its energy requirements
  2. Type 1 Civilization is able to access all the energy available on the planet
  3. Type 2 Civilization can directly consume its star’s energy through Dyson’s sphere
  4. Type 3 Civilization is able to capture all the energy emitted by its Galaxy and every object within it

Similarly, I foresee different level of architectures at different levels of maturity and advancements for an organization as mentioned below –

  1. Type 0 – Simple RDBMS on a single machine serving both Operational and Analytical requirements
  2. Type 1 – Enterprise Data Warehouse supported simple ETL/ELT pipelines, Lambda and Kappa Architecture
  3. Type 2 – Data Lakehouse supported with Medallion Architecture
  4. Type 3 – Data Product as a Service supported through Datamesh way of architecture

Now, lets dig deeper into understanding what these architecture means and works.

Type 0 – A small and a simple RDBMS on a single machine where you have both On Line Transactional Processing and a partitioned space for views/stored procedures/copied tables is created to allow reporting and dashboarding on top.

Benefits – good for small scale data analytics with rudimentary analysis

Problems – performance issues, manual data mapping and merging, data quality issues and not scalable

Fig. 1

Type 1Data Warehouse architecture became famous in 90s, 2000s, 2010s and are still going strong. There have been extended versions with data lake implementation as well through Lambda and Kappa Architecture. This is a controversial topic, but to keep it simple, I would call them as an integral part of modern Data Warehouse solutions.

In the ETL approach, we talk about extracting data from source systems, transforming it on the way with business requirements and load it in the tables ready to be consumed.

Benefits – Centralized approach, standardized, and user friendly

Problems – Batch oriented and slow, not scalable, and rigid

Fig. 2

In the ELT approach, the data is extracted from the Applications directly but is only loaded into the Data Warehouse with out transformation. The transformation logic is executed within the DWH itself.

Benefits – Flexible in implementation and improved scalability

Problems – Data orchestration challenges

Fig. 3

In the Lambda Architecture, we extend the data warehouse with modern platforms capability to have batch data processing coupled with realtime streaming data processing. Here, both the data pipelines stores the data in parallel, batch data pipeline stores the data in a data warehouse and realtime data streams stores the data in a distributed data store or a blob.

Benefits – increased Flexibility and Scalability

Problems – High Complexity and Architecture Lock in as Data Identification & Migration becomes a challenge

Fig. 4

In the Kappa Architecture, which is an extension of the Lambda Architecture, we have only one data processing stream instead of 2 in contrast to the previous one. Here, both, batch as well as realtime data streams are treated as one and as realtime ??.

Benefits – Simple and streamlined pipelines, ease of data identifications and migration, tiered storage

Problems – Complexity is high, and high TCO

Fig. 5

Type 2 - A Data Lakehouse is the next generation architecture resulting through the convergence of the data lake and data warehouse. This open system uses similar data warehouse managing features in conjunction with low-cost storage used for data lakes. The Medallion Architecture is implemented to logically organize data in multiple layers within a Data Lakehouse, namely – Bronze (Raw Landing Area), Silver (Cleaned and Augmented Data) and Gold (Aggregated and Business Ready Data).

Benefits – Improved Data Governance, Data Lineage, High Scalability, and Flexibility

Problems – Still a New Concept, Increased Complexity, and May end up in Data Duplication

Fig. 6

Type 3 - Data mesh is an architectural pattern for implementing enterprise data platforms in large and complex organizations. The benefits of a data mesh approach are achieved by implementing multi-disciplinary teams that publish and consume data products. The main 4 pillars of a data mesh are –

  1. Data Domain – the data in this case is owned by the team that understand it and need it the most, for e.g. Customer Data Product
  2. Data as a Product – build high quality and reusable data solutions like Customer Data Product that can be shared and consumed with Marketing, Sales, Accounting...etc teams but managed by one team as a product
  3. Self Service Data Platforms – have Modern Data Platforms to enable fast and frictionless consumption of the data and provide right information at the right time to the right user to enable fast value realization
  4. Federated Governance – implementation of standards, policies, and controls becomes fast without complexities as team owning the data and data products operates in a federated manner thus reduction of friction and less time is consumed

Benefits – Fast Time to Market, Clear Ownership, High Data Quality, Better Data Governance

Problems – A New Concept, Change Management, Cultural Change, no single platform to implement holistic Data Mesh

Fig. 7

In conclusion, as the data landscape evolves the modern data platform and architecture landscape will evolve too. All these architecture patterns have one thing in common, providing the value to the business.

And, as we mature in terms of data requirements, use case definitions and generating value for our clients and end consumers, we will keep seeing more and more simple yet complex architectures. As of now, I recommend the Data Mesh Architecture Strategy to realize right Value of Data, implement strong Data Governance, high Quality and reliable Traceability of Data and, in the end, have AI readiness.

Hope with high data usage to satiate hunger of GenAI and AI applications, we will end up consuming more energy and one day achieve Type 1 Civilization Status as per Kardashev’s Scale ??!

Mayank Srivastava

Global Director Data & Analytics @ FrieslandCampina | Data, AI, Enterprise Integration

10 个月

Interesting topic for a coffee chat soon ??

要查看或添加评论,请登录

Rudraksh Bhawalkar的更多文章

  • AI Ready Data Foundation powered by Modern Data Platform

    AI Ready Data Foundation powered by Modern Data Platform

    The rapid adoption of artificial intelligence (AI) and generative AI (GenAI) has brought several challenges to the…

    3 条评论
  • Its a bird! Its a plane! No, its Super AI - Agentic AI!!!

    Its a bird! Its a plane! No, its Super AI - Agentic AI!!!

    As said by Andrew Ng, “AI will be able to do everything a human can – may be even better”, is proving true day by day…

    2 条评论
  • AI Ready Data

    AI Ready Data

    With the advent of generative AI and large language models, the importance of data is at its highest point. There are…

    4 条评论
  • What does Data Strategy mean for Generative AI!

    What does Data Strategy mean for Generative AI!

    What does #DataStrategy mean for #GenerativeAI!!! Almost all have heard the iconic phrase, “Help me help you!” from the…

    14 条评论
  • Why do we need Regulations on AI!

    Why do we need Regulations on AI!

    Regulations in AI is the need of the Hour! We live in the world in which AI is ever-present, and that passively…

    4 条评论
  • Responsible Business through Responsible AI and Sustainability

    Responsible Business through Responsible AI and Sustainability

    “You cant escape the responsibility of tomorrow by evading it today”; these are the words of Abraham Lincoln and they…

  • Rise of Responsible AI

    Rise of Responsible AI

    “Ethics is knowing the difference between what you have a right to do and what is right to do”, these are the words of…

    1 条评论
  • Opportunities for fast tracking Innovation due to COVID19 Pandemic

    Opportunities for fast tracking Innovation due to COVID19 Pandemic

    As someone rightly said; “Innovation is the mother of disruption”. But, in the current global situation we have the…

    2 条评论
  • AI for BI - the new age of Business Intelligence

    AI for BI - the new age of Business Intelligence

    “Some people call this artificial intelligence, but the reality is this technology will enhance us. So instead of…

  • RPA - Empowering Data and Analytics

    RPA - Empowering Data and Analytics

    As famously said by Bill Gates (no need of introduction here ??) that, “The first rule of any technology used in a…

社区洞察

其他会员也浏览了