IS THE DATA LAKE DRYING UP?

IS THE DATA LAKE DRYING UP?

Digitalization brings the challenge for companies to collect larger amounts of data (about processes, about the behavior of their customers,...). The value of this data has long been recognized, especially in order to make data- and fact-based product decisions and thus better meet the expectations of your customers. How strongly and how quickly a company makes decisions based on data instead of gut feeling and acts accordingly has already become a decisive competitive advantage in some industries.

In order to use data and technical expertise for all areas of the company as far as possible and to establish standards within the company, as well as to avoid redundant data infrastructures, centrally maintained "data collection points" are usually formed. This can be in the form of a data warehouse (data is already transformed into the preconceived structure), and/or as a data lake (data is transformed when it is used). Above all, reports are formed from this. The highly competent and, of course, responsible "data team" assembled for this purpose now has the honorable task of "producing" the "added values" from the data that are associated with great expectation. In the best case, this is even done by artificial intelligence through machine learning.

This leads directly to an overloaded and demotivated central "Data Team". This team manages the aforementioned data infrastructure and is responsible for ensuring that project teams, Data Scientist and decision makers receive reliable reports and data deliveries. Most of their time is spent tweaking the data quality of the collecting entities in the organization, fixing errors in data structures, adjusting changes from upstream systems, and creating new report requests.

Even the most competent teams can't do this because the problem is in the organization. After all, the "data generators" are the ones who have the "domain knowledge" and determine the data quality. They understand the data and relationships like no one else. The "data consumers" see the respective potential and have the knowledge about the specific application of the data. Both have direct influence on success through data. The "data team" stands with responsibility between both parties without direct influence. This inevitably leads to friction, frustration and misunderstandings.

So the goal must be to bring data generators and data consumers as close together as possible and have the data-producing team provide the data in such a way that consumers can extract value from the data without detailed domain knowledge.

This is accomplished through a common (centralized) self-service data infrastructure platform! In which, however, the individual domain remains in charge of the respective data - Not centralized data responsibility! A central task then becomes rather to control the product thinking in the data provision with the domains and to operate the self-service platform. This also leads to the fulfillment of a homogeneous data landscape and the promotion of standards and governance.

This setup is called a data mesh, whose core innovation is to apply domain-driven design and product thinking to the challenges in the data and analytics field. This harmonizes data silos between the parties involved and allows value extraction from enterprise data to be scaled sustainably. By having operational teams perform analytics on their own and offer selected analytic data products. By linking data from different domains, informed decisions can be made for the further development of operational systems and new innovative services emerge.

要查看或添加评论,请登录

Thomas Hoffmann的更多文章

社区洞察

其他会员也浏览了