登录查看更多内容

Becoming a Data Master needs a distributed architecture: Data Mesh principles to the rescue...

Dan O'Riordan

VP AI & Data Engineering : Agentic Engineering is where we are going so buckle up...

发布日期: 2021年9月20日

Data Mesh – Producing world-class data products, locally

Data lakes used to be at the forefront of the data conversation, but they have now become company dumping grounds as they are very centralized. Data Mesh is a set of principles that looks at a distributed model for Data Ownership and a Product thinking way of looking at Data (Data Products)

Getting that fine balance between federated computational governance and local ownership is a tricky transformation for an organization. This is the paradigm shift that Data Mesh wants to bring about. Discover more below.

Why is this important?

Becoming a major player in Data EcoSystems is strategically very important. All the research from the Capgemini Research Institute, Gartner & Forrester demonstrates this.

Where Data Mesh plays a role is at the foundation helping define the federated data ownership. Capgemini I&D agrees that a distributed approach to data architecture is needed to tackle the sheer scale of modern and legacy Data Estates and has made the investment in building a set of accelerators that will support & jumpstart this approach of Data Mesh with Data Products at the core.

With the accelerators, the easiest way I can describe them is : Think of them as an LCAP (Low Code Application Platform) approach to the Data Infrastructure your Data Product teams will need. We want to bring an LCAP approach to building Data Products that will not only provision the storage zones, the spark compute clusters but will also help with the pre-integration to your Data Trust services and infra monitoring and also the documentation and best practices that a team will need to deliver a Data Product.

All Data Products will need to adhere to a pre-defined set of qualities. The DataSets produced need to have Audit, Balance, Control & Reconcilliation (ABCR)features. The DataSets need to have security and privacy by design. The DataSets need to be found easily in a Google style user experience. Accelerators we are providing will empower your DataProduct teams to build, test, deploy into production their Data Products using the same SDLC methodology and Microservice-based way of working that has revolutionized the way we now build modern applications.

WE ARE BRINGING IT TO DATA!!!

This story would be nothing without a bit of architecture (It is my day job:))

We have taken a layered approach where we have the Data Products in production generating events or calling Data Trust API's to maintain the addressability and trust with a Data Semantic layer powered by a Knowledge Graph (Neo4J) delivering on the Googlesque Enterprise User Experience. One of the reasons there is now a huge appetite for a different approach to Data architectures is, we haven't found a good way to keep our meta-data uptodate.

Historically, we have had a bad relationship with MetaData. We look at it as a problem to be solved "Down the road". With the scale of modern data estates, this is no longer tenable. Your Data Trust layer (Look at metadata as the architectural layer managing what is happening to your data) needs to be continuously updated. Our approach is to develop an LCAP approach that allows DataProduct teams to easily define the integration to the Data Trust services. This will mean, your Meta-Data is constantly current. CRUD to the meta-data.

What have we done so far…?

Capgemini is working with a leading EU bank on its DataMesh journey. The aim is to help the bank reduce the amount of duplicated data in its systems that often cause the bank’s employees to question the validity of any data they find. This significantly impacts this bank’s time-to-market process.

领英推荐

Data Virtualization for Snowflake with a Powerful…

Lyftrondata 3 个月前

Data Virtualization for Snowflake with a Powerful…

Lyftrondata 5 个月前

Forte Spotlight: Data Architecture Modernization, LLMs…

Forte Group 1 个月前

To overcome this, Capgemini’s experts have implemented a number of new ways of working across a variety of the bank’s key processes. These are:

·?????Data ingestion: The bank’s new cloud-based platform now ingests data from multiple sources via different protocols,

·?????Storage and operations: Different data types are now stored separately, to ensure data is always available and used correctly moving forward,

·?????Data preparation: Usable data is now created in one place. This allows the bank to support multi-tenancy and self-service approaches to its data distribution, and also allows it to support data source owners during the data preparation process,

·?????Governance: From ingesting data to provisioning defined data – the bank’s new cloud platform is able to track and catalog everything correctly,

·?????Data provisioning: The bank is now able to perform on-demand data delivery to the platform or requester of its choice while also giving them a workable storage solution.

All of this will lead to this bank increasing its time-to-market and allow its employees to focus on more business-critical tasks. In addition, everyone at the bank will have a clearer idea of their data handling responsibilities. This will significantly decrease the chances of data duplication and will increase efficiency across the organization in the long run.

What’s the impact?

As we have already seen, Data Mesh offers many benefits to local data teams and their clients. But be warned, this is not a journey without risks and requires major buy-in from all parts of the organization. There is a cost to building and operating Data Products. You will need to consider Data Products (which are the Architectural Quantum) the same way you evaluate the building of an application. You will need to evaluate if other parts of your organization will see benefit from using a developed Data Product and are willing to pay for it (On the premise, that the Product is delivering on the promise of quality and trustworthiness).

There will be huge resistance to this new way of looking at data architectures in a distributed way. There is a need for centralized governance which is an integration point between the business, the technology choices, and the Data Product owners. Data Product owners will have their Backlogs of features that needs to align with the strategic roadmap of your organization as it looks to become a major player in next-gen Data EcoSystems.

?We typically see the journey to Data Mesh as follows:

What will the outcomes be?

By breaking down silos, and increasing transparency organization-wide, Data Mesh will encourage our clients to look beyond their own data sets for the first time over the next few years.

Building data eco-systems that will encompass internal, partner, and trusted external data sets will enable the development of more comprehensive analytic and business insights. As an example, with increased communication between manufacturing and marketing teams, organizations will be able to capitalize on how sustainability plays a key role in their processes.

All of this will lead to increased customer satisfaction and brand loyalty from clients that are much more environmentally-minded than their predecessors.

Antti Pikkusaari

AI Transformation Managing Advisor

3 年

Good to see Capgemini going seriously into Data Mesh! Making Data Mesh a reality has its challenges in terms of setting up federated data product ownership, hitting the optimum balance between centralised governance and local autonomy, and enabling all that with state-of-the-art data platforms for self-service with minimum overheads. But whoever is succesful in that will reap significant benefits: Responsiveness, speed, quality, innovation and overall strategic agility. Data Mesh is not the sole marker for future winners but one cornerstone it definitely is.

Kabilan Kanagasabai

Data Engineering Program Management

3 年

Great Article! Need of the hour!

Lukas Birn

Sustainability Lead Capgemini Germany | Getting sustainable now | Data-driven decarbonization

3 年

We have to take the operational applications into account additionally. With DevOps we all know the "You build it, you run it" claim. With Data Mesh we need a new mindset: "You create it, you provide it": The business units that are creating and maintaining the operational applications have to provide the data products of the data created within the applications of their business domain. This will reduce the need for pure data teams and create true DDD for the organization. Obviously this will be even more demanding for the existing organizations. But IMHO this is needed to remove the current organizational friction. And we saw with the success of DevOps that this will be possible.

Bj?rn Frode Kvernstuen

Dad, leader, developer, orchestra conductor and clown

3 年

Spot on! As a microservice developer, I have never truly understood why data lakes and data warehouses (however modern) wanted to fetch my micro data and assemble them in an oversized, overly complicated silo. What an obvious anti-pattern! Data Mesh to the rescue, along with a data platform that enables teams to build data and insight like never before. Huzzah!

3 次回应

Stephanie Fraudet

Specialist in Luxury Property Sales | Luxury, Real Estate Sales

3 年

Brilliant, I agree on everything ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Dan O'Riordan的更多文章

Test-Driven Development (TDD) for AI Agents

2025年2月5日

Test-Driven Development (TDD) for AI Agents

Test-Driven Development (TDD) is a software methodology where tests are written before implementing functionality. In…

14 条评论
How 2025 will see OpenSource & Massive Infra investments bring AI into the Banana Zone

2025年1月26日

How 2025 will see OpenSource & Massive Infra investments bring AI into the Banana Zone

The Emergence of Open-Source Reasoning Models Open-source AI development has been gathering momentum as researchers and…

4 条评论
Datanova 2023 - For the Data Rebels

2023年1月12日

Datanova 2023 - For the Data Rebels

My dear friends at Starburst have asked me to present a session at Datanova 2023 in Boston. It's a virtual event but I…

4 条评论
What do clients really want from DataMesh?

2022年11月29日

What do clients really want from DataMesh?

Let me set some context Don't be shy or ashamed people, who doesn't like the spice girls? NO, this article isn't a…

26 条评论

Becoming a Data Master needs a distributed architecture: Data Mesh principles to the rescue...

Dan O'Riordan

VP AI & Data Engineering : Agentic Engineering is where we are going so buckle up...

Data Mesh – Producing world-class data products, locally

Why is this important?

What have we done so far…?

领英推荐

What’s the impact?

What will the outcomes be?

Dan O'Riordan的更多文章

社区洞察

其他会员也浏览了

Modern Data Architecture Concepts

Building a Future-Proof Data Architecture

What Are the Most Popular Tools for Data Engineering in 2025?

Untangling Dependencies in Domain Ownership: Navigating the Data Spaghetti of a Data Mesh Architecture

The benefits of GraphQL API Architecture: A modern solution for efficient data management

Rethinking the Data Mesh: Apply it Piecemeal

Why Do Modern Businesses Need Data Engineering Services?

8 Data Engineering Best Practices for Building a Robust Data Infrastructure

Data Engineering: The Backbone of Modern Analytics Solutions

Implementing the Medallion Architecture in Microsoft Fabric: A Comprehensive Guide

Data Mesh – Producing world-class data products, locally

Why is this important?

What have we done so far…?

领英推荐

What’s the impact?

What will the outcomes be?

Dan O'Riordan的更多文章

Test-Driven Development (TDD) for AI Agents

How 2025 will see OpenSource & Massive Infra investments bring AI into the Banana Zone

Datanova 2023 - For the Data Rebels

What do clients really want from DataMesh?

社区洞察

其他会员也浏览了

Modern Data Architecture Concepts

Building a Future-Proof Data Architecture

What Are the Most Popular Tools for Data Engineering in 2025?

Untangling Dependencies in Domain Ownership: Navigating the Data Spaghetti of a Data Mesh Architecture

The benefits of GraphQL API Architecture: A modern solution for efficient data management

Rethinking the Data Mesh: Apply it Piecemeal

Why Do Modern Businesses Need Data Engineering Services?

8 Data Engineering Best Practices for Building a Robust Data Infrastructure

Data Engineering: The Backbone of Modern Analytics Solutions

Implementing the Medallion Architecture in Microsoft Fabric: A Comprehensive Guide