Unlocking the Power of Data: The Role of Knowledge Graphs and Data Fabric in Modern Enterprise Landscape

Unlocking the Power of Data: The Role of Knowledge Graphs and Data Fabric in Modern Enterprise Landscape

In today's data-driven enterprise landscape, the increasing ingestion of transactional data across multiple system of record solutions has highlighted the critical need for effective data management. A fundamental aspect of managing data and ensuring its proper utilization across various business lines is understanding the context in which the data exists. Unfortunately, traditional approaches often fail to preserve the contextual knowledge associated with enterprise data. Essential elements such as data lineage, metadata management, data dictionaries, and taxonomies attempt to capture the "context" of a specific entity or data stored in relational databases. However, it is not just the operational transactional data generated by applications that suffer from context loss; even the vast amount of documents stored in shared repositories across multiple business lines, or the multitude of excel sheets scattered across the enterprise network, are at risk of losing their contextual significance if left unmanaged. These diverse assets represent data in various forms, containing valuable information about entities, but are often trapped in isolated silos, leading to a loss of context and hindered data understanding.

The scale of this problem - i.e. the "loss of context" is often unknown - but the symptoms are everywhere - some impacts are associated with massive loss of time for incoming personnel and SME time to understand enterprise domains and discover good data, loss of trust on data - leading to questionable data quality, document management problems, and many more.?

In a way "context" is somewhat related to the idea of connecting siloed entities or data assets and providing sematic meaning to the connections. Introducing connections or relationships between entities at the time of creation or as they appear is a nice flexible way of having all the disjointed data silos to be connected in the form of a nice underlying contextual loosely defined network of ever-evolving entities.?

Where enterprises face challenges in preserving the contextual knowledge of their data due to disparate sources and isolated silos, one promising solution lies in adopting a Data Fabric design philosophy.

Understanding the Enterprise Data Fabric

A Data Fabric is a design paradigm that elegantly hides the mechanics of data integration from the user and abstracts away the technological complexities engaged for data movement, transformation and integration, making all data available across the enterprise. In the process it offers a seamless and simplified semantic and logical view of data assets.

Utilizing powerful technologies like semantic knowledge graphs, metadata management, and machine learning, the Data Fabric unifies diverse data types and endpoints. It enables clustering of related datasets, seamless integration of new data sources, improves data workload management, eliminates data silos, centralizes governance, and enhances overall data quality. Using intelligent integration techniques to join the organisation's data silos is the key objective of a virtual data layer enabling an Enterprise Data Fabric. While traditional data management concepts such as DataOps are focused on the operationalization of large and distributed data assets, the Enterprise Data Fabric is focused on capabilities that unify diverse and distributed data assets (note Figure 1).

No alt text provided for this image

Some benefits of an Enterprise Data Fabric include -??

  • Having Semantic layers of descriptions that enable users to discover and access relevant data
  • Access to a vast pool of data assets in the form of an internal marketplace and supporting discovery
  • Continuous analytics over growing data assets
  • Use of advanced AI systems to connect business relationships between data across disparate applications
  • End-to-end data management visibility to measure various attributes and risk associated with data

Building an Enterprise Knowledge Graph

Typically knowledge graphs are a mechanism to silently introduce "context" to develop an Enterprise Data Fabric. A knowledge graph is essentially a large network of entities with their properties which are connected by semantic relationships, and ontologies the entities conform to. Entities can be defined as business objects of importance such as a "Customer". If the "Customer" is a family member of another "Customer", between these two entities a semantic relationship - "is-a-family-member-of" exists. Within a large-scale entity graph with numerous entity types and contextual relationships, are opportunities to infer numerous forms of complex knowledge - which can often get lost when "context" is not captured within traditional databases.

Knowledge graphs play a crucial role in enabling a virtual data layer due to their native data architecture, which emphasizes linked data objects and utilizes the structure of entities and their relationships to represent enterprise knowledge. It's essential to note that graph virtualization solutions do not require physically storing the data. Instead, their capabilities revolve around pre-defining the graph schema, including entity types, relationship types, and underlying data mapping operations (note Figure 2). This approach results in providing a unified view of data assets.

No alt text provided for this image

To construct an Enterprise Knowledge Graph (EKG), the data pipeline needs to account for diverse data sources, and cover a variety of domains in heterogeneous formats. Useful information from the upstream data sources is extracted by adopting a variety of techniques, viz Entity Extraction and Relation Extraction; such information is then integrated with existing structured data (e.g., via Entity Linking techniques) in order to obtain relatively comprehensive descriptions of the entities. By modeling the data as a graph model, we enable easy data management and the embedding of rich semantics in our data. Finally, in order to facilitate the querying of this mined and integrated data, i.e., the knowledge graph -a data querying service and graph visualization interface can be used.

Let's consider an instance where a business user requires access to graph capabilities. Suppose the user is interested in identifying companies with the highest operating profit in the year 2015 and currently involved in Intellectual Property (IP) lawsuits. To address this query, one must extract company entities from unstructured text documents, like financial reports and court records, and subsequently consolidate the extracted information related to these companies.

With multi-data heterogeneous and hybrid data platforms in the modern enterprise, there exist three main challenges for providing information to knowledge workers such as analysts, lawyers, accountants, traders etc -

  1. How to process and extract useful information from large amounts of structured and unstructured data.
  2. How to integrate extracted information for the same entity across disconnected data sources and store them in a manner for easy and efficient access.
  3. How to quickly find the entities in order to satisfy the information needs of today’s knowledge workers.

Often enterprise data covers a variety of domains, such as marketing, finance, legal, supply chain, logistics, and specific business units related to enterprise products. In terms of the format, data may be structured (e.g., database records) or unstructured (e.g., pdfs, images, notes, articles, ticets, dockets, financial reports, sensors etc). To consume and ingest such data in a scalable manner, the data peipleine needs to be robust enough of processing all types of data (e.g., relation databases, tabular files, free text documents and PDF files) that may be acquired from various data sources. A knowledge graph pipeline should involve extraction, and linking operations to integrate the separate data “silos”.?Further - the data modeling mechanism should be flexible enough to allow scalable data storage, easy data update and schema flexibility. Representing data in tuples of three elements and no fixed schema requirement (the RDF model) allows for more expressive semantics of the modeled data that can be used for knowledge inference - as needed by analysts.

Applications of a knowledge graph are not limited to analytics and knowledge inference - but also as a means to further enrich underlying data using contextual information and enhance enterprise data quality. For example, linked customer entities from two separate databases (or domains) and an intelligent graph walk or reasoning can plug missing facts or enrich these entities with additional facts to achieve completeness and consistency across data silos (note Figure 3).

No alt text provided for this image

In conclusion - a knowledge graph can represent a global view of an enterprise's data assets without being designed for any specific products. Individual products can then pull relevant data from it to satisfy their specific needs while enjoying the rich information that connections in the graph may provide. This way, the graph serves as a central source of data with linkages among the pieces and can be used for enhancing existing products and the development of new ones to better service customers or users.??





That was a rather interesting article.

Jandie Smith

Senior Product Designer

1 年

Great article Shameek! Very insightful

Ashley McCormack

Project and Program Management | Cloud | Data | Data Centre | Security | Digital

1 年

Let’s talk Thursday.

要查看或添加评论,请登录

Shameek Ghosh, Ph.D的更多文章

社区洞察

其他会员也浏览了