Tailoring Generative AI Models with Enterprise Knowledge Graphs for Domain Personalization
Extracting Information & Context from Enterprise Knowledge Graphs

Tailoring Generative AI Models with Enterprise Knowledge Graphs for Domain Personalization

Customer-facing applications frequently rely on transactional databases to manage and store their data representations. Often, the transactional data representation is realized using relational databases. Relational databases are well-suited for structured data with predefined schemas and clear relationships between entities. They excel in scenarios where data requirements are stable, and the focus is primarily on transactional processing.

However, in domains such as machine learning, natural language processing, and generative AI, where the data is unstructured, heterogeneous, and constantly evolving, traditional relational databases may not be the most suitable choice. These applications often deal with large volumes of complex data, including text, images, and other multimedia formats, which are challenging to represent using the rigid schema of relational databases. Such applications demand new data models that capture the semantics of the real data to understand the user better and personalize interactions with the user.

As organizations plan to deploy machine learning, AI, and LLM applications to derive insights and make decisions from vast amounts of data, there is more unstructured data processing under-the-hood necessitating governance and management of the unstructured data stores, data lakes and data swamps consisting of diverse types of data, including text, images, videos, and sensor data etc.

Specially the development of intelligent applications powered by AI models including large language models (LLMs), require a hybrid approach combining various types of data stores and their associated operations while combining transaction-based application functionalities with intelligent AI-based contextual capabilities to perform the task at hand or personalize the interaction with the user. Therefore there is a need to orchestrate data operations (in batch and real-time), shifting seamlessly between transactional processing and intelligent data analysis as needed.

For example, conversational bots tailored to specific domains often need to leverage the enterprise knowledge base to effectively engage with customers:

For instance:

  1. An digital insurance bot can streamline the insurance purchase and underwriting process by tapping into a comprehensive business knowledge base, to recommend options to a customer.
  2. A medical chatbot functions by gathering chief complaints from users, contextualizing them, posing relevant questions, and providing triage assistance based on the information gathered.
  3. An automated retail agent can support a customer by recommending new products based on the customer's preferences.

Such generative AI use-cases exist in numerous domains as shown in Figure 1.

Figure 1: Popular Generative AI use-cases for businesses


Traditionally, when real-time chat applications were developed (prior to AI/ML/LLM operations) between 2 users, a messaging layer would handle the incoming and outgoing unstructured text messages from the users, while capturing the unstructured message text within a transactional database. When an automated agent is simulating a user, some kind of a business logic would follow a set of rules and states to respond with automated messages. This base architecture has largely remained the same as shown in Figure 2.


Figure 2: Traditional real-time chat architecture


Post the release of machine learning models, conversational AI systems typically comprise of three main components as shown in Figure 2 -

Natural Language Understanding (NLU): consumes the user input, uses a machine learning model to detecting the intent from the text and then extract the entities occurring in the input text using an entity recognition model. The intent states what a user is trying to achieve with the conversation, versus relevant context information is typically provided by entities in the input text such as by people, locations, organizations, or concepts.

Dialog Management (DM) manages the actions of the conversational bot and keeps track of the current state of the ongoing conversation.

Natural Language Generation (NLG) generates human understandable responses based on the results of the NLU and DM components. To achieve generation - simple predefined templates or LLMs/large-language models (advanced deep learning models).

Natural Language Generation (NLG) generates human understandable responses based on the results of the NLU and DM components. To achieve generation - simple predefined templates or LLMs/large-language models (advanced deep learning models).


Figure 2: Common chatbot system architecture based on natural language processing and ML models

These task-specific cognitive agents powered by foundational LLM models (and their NLG predecessors) like GPT, Cohere, Claude, Gemini , with natural language interfaces have gained a lot of attention and traction in many application domains. Impressive as their generative linguistic abilities may be, their value can be limited due to their tendency to "hallucinate" or predict a next sequence of text using probabilistic combinations of words and their sequences. Note that a foundational statistical system generates the answer text based on the probability of word sequences occurring together in the universe. This means reporting an answer that is relevant in a specific domain or a personalization context can be tricky for a foundational LLM since it is not trained or fine-tuned with domain specific data.


Figure 3: An Enterprise Knowledge Graph (EKG) managing context and prior information for the Machine Learning layer driving the customer facing generative AI models and their transactions


As shown in Figure 3, an Enterprise Knowledge Graph can be constructed to orchestrate and manage the quality and domain-specificity of LLM generation responses. An Enterprise Knowledge Graph can be used to define the schema of knowledge to represent the enterprise textual data in domain-specific documents into conceptual graphs and can be integrated with enterprise data objects. In order to enable automated agents to explain enterprise knowledge, the textual format knowledge is converted into a graph expression based a conceptual graph knowledge representation of nodes and edges between nodes: domain terminologies are classified and served as the nodes (entities) of the graph, and sentences that describe relationships between domain specific terms or entities are abstracted as the edges of the graph. In addition, the descriptive knowledge which explain the entities and context around known relationships are taken as the attributes of the entities and the edges.

Note that a hybrid combination of document data representation and a structured transactional data representation allows to return responses to a user which are more personalized and in-context.


Figure 4: Continuous Feedback using a Human-in-the-Loop Generative AI and EKG architecture

The governance of enterprise knowledge and integration of continuous feedback by end users and internal business teams into this process is essential to ensure high-quality of results. A useful AI-driven cognitive agent should have the ability to acquire new knowledge based on feedback to continuously optimize itself.

In Figure 4, the data pipeline is divided into three layers depending on the processing status: The bronze layer stores raw data with low data quality, preprocessed and cleaned data is stored in the silver layer, and the gold layer stores high-quality data that is used in downstream applications. Often, data in the gold layer has been manually validated.

A common conversational architecture with LLM should focus on high-quality knowledge stored in the gold layer, and insufficient intermediate quality stored in the silver layer. Instead of manual quality validation, we rely on automated quality assessment based on data processing and usage statistics in the silver layer. High-quality objects are automatically transferred into the gold layer, while inferior objects with specific weaknesses remain in the silver layer. To identify weaknesses in the EKG, data quality dimensions such as accuracy, consistency, completeness, timeliness, and redundancy must be considered in the silver layer.


Figure 5: Fine-tuning and orchestrating LLM responses using an EKG

As seen in figure 5, for generative AI applications, the process of refining prompts for large language models (LLMs) using knowledge graphs (KGs) can form a critical component of ensuring the accuracy and relevance of generated responses. Beginning with the identification of necessary components of a prompt, including entities, relationships, and key features - contextual information is extracted from the EKG, tapping into its rich semantic network. Post collecting the results, a detailed context-rich prompt is used to guide the foundational LLM to generate a more accurate response.

Therefore, Large language models and Enterprise Knowledge Graphs are inherently complementary components in an end-to-end human-in-the-loop feedback based architecture. This unified framework mainly comprises of:

  • Data layer: In this layer, domain-specific knowledge graphs, and transactional databases process textual and structural data at different levels of assurance.
  • Synergized Model layer: In this layer, the model and graph collaborate to enhance their capabilities, leveraging their strengths and knowledge representations.

The unification of large language models and knowledge graphs can unlock the full potential of these techniques, enabling them to address complex challenges and drive advancements in various application domains.


Nishant Nischal Chintalapati

Senior Product Manager | Specialist in 0 to 1 |2X Entrepreneur | Helping Businesses adopt AI | Carnegie Mellon

11 个月

Crafting LLM responses with an Enterprise Knowledge Graph is the golden ticket for accurate results. ??

Deep Dutta

AI Strategy & Consulting | Digital Transformation and Intelligent Automation | AI, CAI, Gen-AI and Agentic AI

11 个月

Good read !! Seen a solution - A technical assistant chatbot, based on LLMs, harnessing KGs to mitigate risks of GenAI particularly contextual understanding & bias mitigation..

要查看或添加评论,请登录

Shameek Ghosh, Ph.D的更多文章

社区洞察

其他会员也浏览了