Data Architecture Patterns
Rudraksh Bhawalkar
Partner @ EY Tech Consulting || Cloud & Data Modernization || Sustainability || Responsible AI
As Kardashev Scale provides the categorization of civilization maturity and technological advancements based on the energy it consumes, for e.g. –
Similarly, I foresee different level of architectures at different levels of maturity and advancements for an organization as mentioned below –
Now, lets dig deeper into understanding what these architecture means and works.
Type 0 – A small and a simple RDBMS on a single machine where you have both On Line Transactional Processing and a partitioned space for views/stored procedures/copied tables is created to allow reporting and dashboarding on top.
Benefits – good for small scale data analytics with rudimentary analysis
Problems – performance issues, manual data mapping and merging, data quality issues and not scalable
Type 1 – Data Warehouse architecture became famous in 90s, 2000s, 2010s and are still going strong. There have been extended versions with data lake implementation as well through Lambda and Kappa Architecture. This is a controversial topic, but to keep it simple, I would call them as an integral part of modern Data Warehouse solutions.
In the ETL approach, we talk about extracting data from source systems, transforming it on the way with business requirements and load it in the tables ready to be consumed.
Benefits – Centralized approach, standardized, and user friendly
Problems – Batch oriented and slow, not scalable, and rigid
In the ELT approach, the data is extracted from the Applications directly but is only loaded into the Data Warehouse with out transformation. The transformation logic is executed within the DWH itself.
Benefits – Flexible in implementation and improved scalability
Problems – Data orchestration challenges
In the Lambda Architecture, we extend the data warehouse with modern platforms capability to have batch data processing coupled with realtime streaming data processing. Here, both the data pipelines stores the data in parallel, batch data pipeline stores the data in a data warehouse and realtime data streams stores the data in a distributed data store or a blob.
领英推荐
Benefits – increased Flexibility and Scalability
Problems – High Complexity and Architecture Lock in as Data Identification & Migration becomes a challenge
In the Kappa Architecture, which is an extension of the Lambda Architecture, we have only one data processing stream instead of 2 in contrast to the previous one. Here, both, batch as well as realtime data streams are treated as one and as realtime ??.
Benefits – Simple and streamlined pipelines, ease of data identifications and migration, tiered storage
Problems – Complexity is high, and high TCO
Type 2 - A Data Lakehouse is the next generation architecture resulting through the convergence of the data lake and data warehouse. This open system uses similar data warehouse managing features in conjunction with low-cost storage used for data lakes. The Medallion Architecture is implemented to logically organize data in multiple layers within a Data Lakehouse, namely – Bronze (Raw Landing Area), Silver (Cleaned and Augmented Data) and Gold (Aggregated and Business Ready Data).
Benefits – Improved Data Governance, Data Lineage, High Scalability, and Flexibility
Problems – Still a New Concept, Increased Complexity, and May end up in Data Duplication
Type 3 - Data mesh is an architectural pattern for implementing enterprise data platforms in large and complex organizations. The benefits of a data mesh approach are achieved by implementing multi-disciplinary teams that publish and consume data products. The main 4 pillars of a data mesh are –
Benefits – Fast Time to Market, Clear Ownership, High Data Quality, Better Data Governance
Problems – A New Concept, Change Management, Cultural Change, no single platform to implement holistic Data Mesh
In conclusion, as the data landscape evolves the modern data platform and architecture landscape will evolve too. All these architecture patterns have one thing in common, providing the value to the business.
And, as we mature in terms of data requirements, use case definitions and generating value for our clients and end consumers, we will keep seeing more and more simple yet complex architectures. As of now, I recommend the Data Mesh Architecture Strategy to realize right Value of Data, implement strong Data Governance, high Quality and reliable Traceability of Data and, in the end, have AI readiness.
Hope with high data usage to satiate hunger of GenAI and AI applications, we will end up consuming more energy and one day achieve Type 1 Civilization Status as per Kardashev’s Scale ??!
Global Director Data & Analytics @ FrieslandCampina | Data, AI, Enterprise Integration
10 个月Interesting topic for a coffee chat soon ??