How a "Data Fabric" can simplify the management of your data?
Yvan Cognasse
Oracle Senior Director | Expert Lecturer & Thought Leader in AI | Renowned Speaker
A recurring problem encountered by many organisations concerns the transition from experimentation of their data enhancement projects (driven by initiatives such as Big Data, Artificial Intelligence and other Machine Learning) to production.
If you too are concerned by this difficulty, you should find in the concept of "Data Fabric" an element of response. But what is really behind this new buzzword?
80% of "Data" projects do not go into production
Let's start with an observation. According to a recent study by Gartner (source: https://blogs.gartner.com/andrew_white/2019/01/03/our-top-data-and-analytics-predicts-for-2019/) which interviewed IT managers in 2019, 80% of data enhancement projects never go into production.
Of course, the concepts of Big Data and Artificial Intelligence have been democratising in your organisations for several months, if not years. But there is also a strong likelihood that these prototypes are slow to demonstrate their values and deliver on their promises. Rest assured, if you are one of the many companies that have launched their "data-driven company" initiatives with great fanfare, you have very little chance of bringing them to fruition, i.e. into production. Below are some possible explanations:
- You adopt an overly artisanal approach in the way you apprehend your data-driven projects, unlike a more industrialised approach focused on rapid production,
- To your credit, the technologies used in this type of project are constantly evolving, which makes it more difficult to understand and integrate them with the rest of your information system,
- Although your entire organisation is naturally concerned by data enhancement initiatives, there is a lack of alignment between your different IT, business and enterprise architecture teams,
- Rather than capitalising on what has been achieved and then extending it to other departments or geographies within your organisation, your cases of use for data enhancement are limited to too specific issues.
As a result, your transformation projects carried out by strategic data enhancement subjects rarely go beyond the prototype or demonstration stage. There are many reasons for this, but they can be summed up as a lack of alignment, strategic vision and support from your management.
The Data Fabric concept could be a key to unlock these challenges, and thus finally integrate data management into the heart of your business.
Data as renewable energy
In order to succeed in data enhancement, one of the steps consists in adopting a technological solution with its dedicated architecture. We will define it as a multidisciplinary entity, whose vocation is to optimise the use of all the data in order to give it back its business value. We will call this platform Data Fabric. It will be dedicated to the "functional" qualification of data in production.
Much has been written in recent years about the trends in new technological capabilities and how they are now shaping corporate strategy. For example, it is common to read or hear that "data is the new oil". Another analogy, which I prefer to the first, characterises data as more valuable than oil because it is a "renewable" asset. This is because data comes from multiple sources that your organization is constantly renewing, as opposed to a non-renewable energy, such as oil, which is running out of stock.
Data valuation as the 4th industrial revolution
A paradigm shift has been taking place for several years. The old architectures of information systems have practically all evolved into what Klaus Schwab calls "data-centric cyber-physical systems" in his book "The Fourth Industrial Revolution". This new standard of architecture is the result of what is commonly referred to as the Fourth Industrial Revolution.
In his famous book "The Inevitable", Kevin Kelly argues that this paradigm shift was in fact inevitable. He explains how the survivors and winners in this new economy are the companies that learn from sharing (more efficient than mere accumulation), decentralising and copying (more powerful than storage) and connecting information assets across borders (data flow being the starting point for all forms of innovation and socialisation).
Concretely, this means for you that it is no longer just a matter of producing dashboards with past data (as if you were driving your car with only the rear-view mirror on) but of going further in exploiting your data to identify weak signals, trends, forecasts, fraud and other use cases.
The path of the connected customer...
For the implementation of a data enhancement platform to be successful, the first condition is obviously to define the performance objectives for your organisation. A classic example is having a better knowledge of your customers and prospects.
In order to highlight the characteristics of a Data Fabric, let's take a closer look at this example with the case of a company that sells connected sports watches. The customer journey begins with the step of searching for information on connected watches by a prospect (technical details, videos, customer reviews, prices, availability, user manual, etc.). The latter then inquires about product characteristics. Let's imagine that he buys a copy. He then possibly asks for help in solving a configuration problem. Finally, he uses it for several months and, why not, recommends it to his sports friends.
We have just described an example of a "Customer journey". This concept has been around for a long time and, whether you are a company in the B2B or B2C sector, you have certainly already been involved in similar work to optimise all the points of interaction with your own customers described using this journey. You have therefore already carried out some of the following actions:
- Optimising your website and mobile application,
- Automate your communication actions,
- Increase your e-commerce capabilities with partner platforms,
- Improve the efficiency of your point of sale systems in shops or at your partners' premises,
- Set up and improve your contact centres to better help your customers,
- Monitor the use of the products you sell (e.g. by exploiting the IoT's capabilities),
- Create loyalty programmes to get in touch with your most important customers.
Well done. You now think you know your identified customers very well all along your "Customer journey" (A3 posters illustrated with dozens of post-it notes still proudly sit on one of the walls of your office...). You have optimised all the processes around your customers with the help of your colleagues from other departments (Marketing, Customer services, Sales, Loyalty management, etc.).
Unpredictability is the new standard
The problem with the "Customer journey" is that the very idea of a signposted customer journey is obsolete... The reason is simple: your customers don't see their journey as a simple travel map, all drawn up in advance. They see it as an experience around your brand. Indeed, they are constantly on the move and expect you to deliver an increasingly personalised, contextual, real-time experience. So the way they interact with your brand is no longer predictable. More than ever, they expect an immediate and relevant experience every time they interact with you.
As a result, the problem of understanding your customers is all the more difficult. And this difficulty of unpredictability has now become the new standard. Another way of describing this difficulty is to consider that blind spots are everywhere. It only takes one bad experience to lose a valuable customer, as in the following examples:
It is therefore not surprising that your colleagues in other departments (Marketing, Customer services, Sales, Loyalty management, etc.) but also in back-office functions (invoicing, planning, logistics, R&D, supplier management, etc.) have a hard time retaining and developing your most profitable customers. The main cause lies in the inability to provide these valuable customers with a timely, relevant and consistent experience in every interaction with your brand.
Data disconnect makes knowing your customers more complex
The problem lies in the valorization of the data. Each interaction functions as a separate conversation. Each contact with your customer (through the Marketing, Sales, Service, etc. functions of your organisation) then produces more and more siloed data. It is precisely these data silos that make a single view of your customer impossible. And the lack of a single view of your customer makes it difficult, if not impossible, to create a consistent experience throughout your customer's journey. The result is an inability to surprise and delight your customers in the moments that matter to them.
Why a Data Fabric?
Through these examples, we have defined some of the possible objectives of a Data Fabric:
- Customer segmentation,
- The sales forecast,
- Automation of internal processes (e.g. invoice processing, supplier selection, employee training),
- More generally, the provision of consolidated performance indicators and recommendations for action for all the entities in your organisation.
How to feed a Data Fabric?
Now that the business objectives have been identified, you are ready to implement your data enhancement platform. The various data in your information system are obviously the first (inexhaustible) source to be exploited. These are, for example, the traces left by your customers and prospects when they use your services, your products or the mobile applications you offer them. They are also your transactional data (in particular manipulated by your ERP and CRM systems), and the messages left on your social networks. It is also external data (called "3rd party data"), which you do not own but can rent or buy from data providers. These are for example data on geographical locations, elements of socio-professional segmentation, consumer habits, etc. All these data, both internal and external, are by nature extremely "polyglot".
You will also need to reference your data sources: Where did all this data come from? How to group them together? What are the regulatory and ethical constraints linked to their use? In order to respond, you will need to have a sufficient quantity and quality of data available. You will then need to reference this data. They can then be explored, while remaining vigilant about their compliance with the regulations in force (such as the GDPR).
The principle of a Data Fabric platform is that all data (internal and external) is grouped and accessible in one place. The immediate advantage is that it is much simpler to cross-reference and enrich this data in order to increase its value.
Many challenges
If ideas sometimes seem simple, you know that the difficulty lies in their implementation. Unfortunately, the Data Fabric idea is no exception to the rule. Numerous factors disrupt the industrialisation of the concept. I share below some customer observations collected during my most recent experiences on this subject:
- Prototypes are great, but they appear and disappear at an alarming rate within our organisation! Which ones will survive and are really worth investing in?
- Data-driven innovation remains at the development stage in our IT department and never reaches the production stage!
- We encounter too many difficulties due to our data silos. It is very difficult to ingest and merge data in one place!
- How do we secure, size and govern the value of all the recommendations produced by this data?
- It is almost impossible for us to easily acquire or maintain data science skills...
- Data analysis is the first step in the whole process. And it is often the most difficult and costly...
- I can't share my views on data mining with the rest of the organisation because no one is interested...
- A lot of costs, but where is the value?
- What are the ethical implications of artificial intelligence?
The problem of data valorisation for the Lines of Business (LoBs)
From a LoBs perspective, the main challenge is to bring together and merge multilingual data silos. This is compounded by the large amounts of data to be processed and the lack of a clear governance model for the data.
As a result, it will probably take a lot of time to load and merge data from different sources. Faced with the rapid lack of results, you will then have to face the frustration of your business teams. Then you will lose the support of your management or the Businesses, or worse of both... Your teams will then be perceived as not being very reactive, not responding quickly enough to the challenges of your organisation.
The problem of data valorisation for IT
On the IT side, it's not much better... The old architecture paradigms, which have been in place in companies for 30 years, are often based on monolithic data management concepts. Yet these old paradigms still affect the way human processes and technologies are organised. And the implementation of a Data Fabric will often be hampered by an overly monolithic way of thinking. This model is not very effective for truly transforming (and not just modernising here and there...) your information system and thus accompanying the digital transformation of your organisation towards a better use of data.
The ambition of data enhancement therefore necessarily requires us to look at human processes and technologies in order to reconsider them.
What is the evolution towards the Data Fabric model?
A centralised monolithic technical solution often leads to monolithic organisations and team processes that deal with these monolithic technologies. To reverse this trend, the market is currently undergoing a fundamental transformation around data management that is moving away from more traditional data management architectures such as "Hub & Spoke" and "Hub". These legacy architectures are like monolithic mainframes; they did their job but were too centralised; too fragile and now far too expensive to maintain or upgrade.
The future of data integration is therefore evolving towards a Data Fabric, in the sense of a distributed architecture of micro-services and data meshing called "data mesh") that can operate on any Cloud, or at the edge of the Cloud (i.e. between local and Cloud hybrid environments, or in multi-Cloud models from multiple vendors).
Architecture of a Data Fabric
The architecture of a Data Fabric is based on 3 essential components:
- The entry points ;
- The connections;
- Data services.
The entry points are the place where data resides and through which it flows. An entry point could for example be your CRM or marketing campaign automation solution, your product or service repository, your price list. These entry points can be hosted on a Cloud or with a third party data provider. All these entry points are different in nature. The more of them you can activate, the more opportunities you give your organisation to add value.
Of course, these value-adding opportunities only exist if you have connections above all these data. It is useless to have many data sources (internal or external) if you cannot easily and securely exchange them with each other. Connections ensure that all data can communicate with each other and that they can flow seamlessly: data in the right place, at the right time, and with the right characteristics so that you can do what you need to do with them and thus offer data services.
Data services are the real opportunity of a Data Fabric. To refer to the previous example of the customer journey, data services will enable you to deliver a timely, relevant and consistent customer experience in real time when you interact with your brand.
Data Life Cycle in a Data Fabric
The lifecycle of data in a data fabric begins with the ingestion of the data. It can come from different sources and in different formats: this is the point of entry, as described above.
The "Discover" part of our diagram above is not usually part of the Data Fabric. Indeed, Data Fabric is not a Data Science platform. They are two separate tools.
To simplify, the Data Science platform is presented as a tool to develop algorithms that will concretise Artificial Intelligence projects, and more particularly Machine Learning or Deep Learning. It is not always adapted to business profiles for whom the algorithms must first be integrated into an application in which these algorithms will be implemented in order to generate their recommendations (for example within a CRM application in order to suggest a new personalised offer to a customer who wishes to stop a subscription to one of your services).
However, the Data Fabric is a real ecosystem that enables data management, from data extraction to data processing and consumption. Unlike the Data Science platform, its primary goal is to put Big Data or Artificial Intelligence type projects into production.
In detail, the Data Fabric provides a set of tools to help preserve data:
- It transforms them into something that can be used by downstream solutions;
- It then has a layer of governance to determine who has the right to see the data, secure it and possibly encrypt it;
- It offers a layer of orchestration of the data transformation and transport processes;
- Finally, it has a consumption layer that allows your analysts to easily find and use data in daily analytical processes (these are examples of data services).
A Data Fabric architecture is therefore designed to assemble historical data from multiple data silos to produce a uniform and unified view of the data. It provides an elegant solution to the complex IT challenges of handling huge amounts of data from disparate sources without having to replicate them all in another repository. This capability is often achieved through a combination of data integration, data virtualisation and data management technologies to create a unified semantic data layer that facilitates many business processes (such as accelerating data preparation and facilitating data science).
The concept of Data Mesh within a Data Fabric
Increasingly, as the data structure moves from a static to a dynamic infrastructure, it develops into what is known as a Data Mesh.
This is a distributed data architecture that follows a metadata-driven approach and is supported by machine learning capabilities. It is a tailored distributed ecosystem with reusable data services, centralized governance policy and dynamic data pipelines.
The main idea of Data Mesh is that data ownership is distributed across the different units of your organisation, in a self-service and therefore easily consumable format. In other words, the data is owned by one organisational unit and is made available for efficient use by different teams.
Another important aspect of Data Mesh is its centralised discovery system, available throughout your organisation (also known as a data catalogue). Using the data catalogue, multiple teams looking for information can access the data discovery system to obtain information such as the data available in the system, their points of origin, data owners, data samples and metadata information. Data is indexed in this centralized registry system for fast and secure discovery.
Oracle architecture to support a Data Fabric
As you can see, the Data Fabric is at the intersection of the Data Management, Data Science and Data Lake platforms. It represents a coherent set of software and application solutions, independent of architecture choices (on site or in the Cloud). It offers a more complete solution by enabling end-to-end management of the life cycle of your data: collection, storage, processing, modelling, deployment, monitoring and governance.
In this way, thanks to a set of technologies available within a Data Fabric, many business issues can be addressed. Another advantage of a Data Fabric is its ability to offer a different view of the company's data. This view can then be shared with all teams. For example, less expert profiles will be able to have access to it, and bring their own business vision.
In a true Data Fabric, all data processing and enhancement technologies are natively integrated. Thus, the business lines can easily access the data, while the more technical profiles benefit from its openness to work with any language (R, Python, etc.).
Oracle is recognised as the leader in this field, as attested by the latest report "The Forrester Wave?: Enterprise Data Fabric, Q2 2020 (The Forrester Wave?: Enterprise Data Fabric, 2nd half of 2020)" available at https://blogs.oracle.com/dataintegration/oracle_forresterwave_datafabric_2020.
Modernising your business with a Data Fabric strategy
Digital transformation is the relevant connection of all the data of your customers, partners, services, products and internal processes. The Data Fabric architecture eliminates many obstacles to the reconciliation and valorisation of these data:
- Simple valorisation of all data;
- Transformation of data on site or in the Cloud, while improving data quality;
- Use of powerful built-in analytics and machine learning for each user, application and use case.
If you, too, have experience in defining, implementing or using Data Fabric, please feel free to share your experience in the comments.
Thanks for sharing