Becoming a Data Master needs a distributed architecture: Data Mesh principles to the rescue...
Dan O'Riordan
VP AI & Data Engineering : Agentic Engineering is where we are going so buckle up...
Data Mesh – Producing world-class data products, locally
Data lakes used to be at the forefront of the data conversation, but they have now become company dumping grounds as they are very centralized. Data Mesh is a set of principles that looks at a distributed model for Data Ownership and a Product thinking way of looking at Data (Data Products)
Getting that fine balance between federated computational governance and local ownership is a tricky transformation for an organization. This is the paradigm shift that Data Mesh wants to bring about. Discover more below.
Why is this important?
Becoming a major player in Data EcoSystems is strategically very important. All the research from the Capgemini Research Institute, Gartner & Forrester demonstrates this.
Where Data Mesh plays a role is at the foundation helping define the federated data ownership. Capgemini I&D agrees that a distributed approach to data architecture is needed to tackle the sheer scale of modern and legacy Data Estates and has made the investment in building a set of accelerators that will support & jumpstart this approach of Data Mesh with Data Products at the core.
With the accelerators, the easiest way I can describe them is : Think of them as an LCAP (Low Code Application Platform) approach to the Data Infrastructure your Data Product teams will need. We want to bring an LCAP approach to building Data Products that will not only provision the storage zones, the spark compute clusters but will also help with the pre-integration to your Data Trust services and infra monitoring and also the documentation and best practices that a team will need to deliver a Data Product.
All Data Products will need to adhere to a pre-defined set of qualities. The DataSets produced need to have Audit, Balance, Control & Reconcilliation (ABCR)features. The DataSets need to have security and privacy by design. The DataSets need to be found easily in a Google style user experience. Accelerators we are providing will empower your DataProduct teams to build, test, deploy into production their Data Products using the same SDLC methodology and Microservice-based way of working that has revolutionized the way we now build modern applications.
WE ARE BRINGING IT TO DATA!!!
This story would be nothing without a bit of architecture (It is my day job:))
We have taken a layered approach where we have the Data Products in production generating events or calling Data Trust API's to maintain the addressability and trust with a Data Semantic layer powered by a Knowledge Graph (Neo4J) delivering on the Googlesque Enterprise User Experience. One of the reasons there is now a huge appetite for a different approach to Data architectures is, we haven't found a good way to keep our meta-data uptodate.
Historically, we have had a bad relationship with MetaData. We look at it as a problem to be solved "Down the road". With the scale of modern data estates, this is no longer tenable. Your Data Trust layer (Look at metadata as the architectural layer managing what is happening to your data) needs to be continuously updated. Our approach is to develop an LCAP approach that allows DataProduct teams to easily define the integration to the Data Trust services. This will mean, your Meta-Data is constantly current. CRUD to the meta-data.
What have we done so far…?
Capgemini is working with a leading EU bank on its DataMesh journey. The aim is to help the bank reduce the amount of duplicated data in its systems that often cause the bank’s employees to question the validity of any data they find. This significantly impacts this bank’s time-to-market process.
领英推荐
To overcome this, Capgemini’s experts have implemented a number of new ways of working across a variety of the bank’s key processes. These are:
·?????Data ingestion: The bank’s new cloud-based platform now ingests data from multiple sources via different protocols,
·?????Storage and operations: Different data types are now stored separately, to ensure data is always available and used correctly moving forward,
·?????Data preparation: Usable data is now created in one place. This allows the bank to support multi-tenancy and self-service approaches to its data distribution, and also allows it to support data source owners during the data preparation process,
·?????Governance: From ingesting data to provisioning defined data – the bank’s new cloud platform is able to track and catalog everything correctly,
·?????Data provisioning: The bank is now able to perform on-demand data delivery to the platform or requester of its choice while also giving them a workable storage solution.
All of this will lead to this bank increasing its time-to-market and allow its employees to focus on more business-critical tasks. In addition, everyone at the bank will have a clearer idea of their data handling responsibilities. This will significantly decrease the chances of data duplication and will increase efficiency across the organization in the long run.
What’s the impact?
As we have already seen, Data Mesh offers many benefits to local data teams and their clients. But be warned, this is not a journey without risks and requires major buy-in from all parts of the organization. There is a cost to building and operating Data Products. You will need to consider Data Products (which are the Architectural Quantum) the same way you evaluate the building of an application. You will need to evaluate if other parts of your organization will see benefit from using a developed Data Product and are willing to pay for it (On the premise, that the Product is delivering on the promise of quality and trustworthiness).
There will be huge resistance to this new way of looking at data architectures in a distributed way. There is a need for centralized governance which is an integration point between the business, the technology choices, and the Data Product owners. Data Product owners will have their Backlogs of features that needs to align with the strategic roadmap of your organization as it looks to become a major player in next-gen Data EcoSystems.
?We typically see the journey to Data Mesh as follows:
What will the outcomes be?
By breaking down silos, and increasing transparency organization-wide, Data Mesh will encourage our clients to look beyond their own data sets for the first time over the next few years.
Building data eco-systems that will encompass internal, partner, and trusted external data sets will enable the development of more comprehensive analytic and business insights. As an example, with increased communication between manufacturing and marketing teams, organizations will be able to capitalize on how sustainability plays a key role in their processes.
All of this will lead to increased customer satisfaction and brand loyalty from clients that are much more environmentally-minded than their predecessors.
AI Transformation Managing Advisor
3 年Good to see Capgemini going seriously into Data Mesh! Making Data Mesh a reality has its challenges in terms of setting up federated data product ownership, hitting the optimum balance between centralised governance and local autonomy, and enabling all that with state-of-the-art data platforms for self-service with minimum overheads. But whoever is succesful in that will reap significant benefits: Responsiveness, speed, quality, innovation and overall strategic agility. Data Mesh is not the sole marker for future winners but one cornerstone it definitely is.
Data Engineering Program Management
3 年Great Article! Need of the hour!
Sustainability Lead Capgemini Germany | Getting sustainable now | Data-driven decarbonization
3 年We have to take the operational applications into account additionally. With DevOps we all know the "You build it, you run it" claim. With Data Mesh we need a new mindset: "You create it, you provide it": The business units that are creating and maintaining the operational applications have to provide the data products of the data created within the applications of their business domain. This will reduce the need for pure data teams and create true DDD for the organization. Obviously this will be even more demanding for the existing organizations. But IMHO this is needed to remove the current organizational friction. And we saw with the success of DevOps that this will be possible.
Dad, leader, developer, orchestra conductor and clown
3 年Spot on! As a microservice developer, I have never truly understood why data lakes and data warehouses (however modern) wanted to fetch my micro data and assemble them in an oversized, overly complicated silo. What an obvious anti-pattern! Data Mesh to the rescue, along with a data platform that enables teams to build data and insight like never before. Huzzah!
Specialist in Luxury Property Sales | Luxury, Real Estate Sales
3 年Brilliant, I agree on everything ??