Process Knowledge-Infused AI: Towards User-level Explainability,
Interpretability, and Safety

Process Knowledge-Infused AI: Towards User-level Explainability, Interpretability, and Safety

This is a preprint version of (cite as): A.Sheth, M. Gaur, K. Roy, R. Venkataraman, V. Khandelwal, "Process Knowledge-Infused AI: Towards User-level Explainability, Interpretability, and Safety," IEEE Internet Computing, 26 (4), July/Aug 2022. [arXiv]

AI systems have been widely adopted across various domains in the real world. However, in high-value, sensitive, or safety-critical applications such as self-management for personalized health or food recommendation with a specific purpose (e.g., allergy-aware recipe recommendations), their adoption is unlikely. Firstly, the AI system needs to follow guidelines or well-defined processes set by experts; the data alone will not be adequate. For example, to diagnose the severity of depression, mental healthcare providers use Patient Health Questionnaire (PHQ-9). So if an AI system were to be used for diagnosis, the medical guideline implied by the PHQ-9 needs to be used. Likewise, a nutritionist’s knowledge and steps would need to be used for an AI system that guides a diabetic patient in developing a food plan. Second, the BlackBox nature typical of many current AI systems will not work; the user of an AI system will need to be able to give user-understandable explanations, explanations constructed using concepts that humans can understand and are familiar with. This is the key to eliciting confidence and trust in the AI system. For such applications, in addition to data and domain knowledge, the AI systems need to have access to and use the process knowledge, an ordered set of steps that the AI system needs to use or adhere to. The use of process knowledge also aids in developing an AI system that has other important features for such demanding applications, such as safety, which is critical to ensure the AI system remains within the bounds and does not cause harm to a user - something that current AI systems using ?Large Language Models have failed to do. This article will use two demanding applications – mental health triaging and cooking recipes for diabetes to show what process knowledge they need and how process knowledge can be modeled and then infused into AI algorithms to achieve the outlined features. We also discuss user-level explainability, how to support safety and present metrics for performance evaluation.

Keywords: process knowledge, knowledge-infused learning, safely, model interpability, user-level explainability

BENEFITS AND IMPORTANCE OF PROCESS KNOWLEDGE

?Benchmarking datasets that assess the natural language understanding capabilities of large language models fall short in accelerating models to achieve user-level explainability, safety, uncertainty, and risk handling [1]. These challenges are associated with the limitations of AI in restricting its learning tasks to classification and generation, which are single shots. In comparison, real-world applications demand an orchestrated response going through a multi-step process of learning the high-level needs of the user, then drilling down to specific needs, and subsequently yielding a structured response having a conceptual flow. For example, triaging patients in mental health requires clinical process knowledge manifested in a clinical questionnaire. Figure 1 illustrates a scenario where the agent maps user input to a sequence of yes or no questions to compile suicide risk severity. The agent can keep track of user-provided cues and ask appropriate follow-up questions through these ordered sets of questions. Upon receiving the required information to derive appropriate severity labels, the agent’ outcome can be explained to mental healthcare providers for appropriate intervention. Similar but more complex applications include using ADOS (Autism Diagnostic Observation Schedule) to evaluate children with autism or using MoCA (Montreal Cognitive Assessment) score to measure the cognitive decline in post-stroke Aphasia patients [2]. To train conversational agents for such functionality requires specialized datasets grounded in the knowledge that enables AI systems to exploit the duality of data and knowledge for human-like decision-making [3]. Further, to develop agents that learn from such process knowledge-integrated datasets we require interpretable and explainable learning mechanisms. These learning mechanisms have been characterized under the umbrella of Knowledge-infused Learning.

?Knowledge-infused Learning (KiL) is a class of Neuro-Symbolic AI techniques is that utilize a variety of knowledge (lexical, linguistic, domain-specific, commonsense, process knowledge, and constraint-based) in different forms and abstractions into deep neural networks. It improves upon data-centric statistical learning to reduce training, reduce computing needs, and broaden coverage,resulting in improved performance, safety and model interpretation, and providing user-level explanations.
No alt text provided for this image

SHADES OF PROCESS KNOWLEDGE INFUSED LEARNING

No alt text provided for this image

KiL aligns with the third phase of DARPA to promote contextual adaptation in AI systems for user-level explanations. An AI system trained with knowledge infusion techniques provides forms of explanations by querying, traversing, and mapping the high-importance features to concepts in knowledge graphs (KGs). Figure 2 illustrates the user-level explanations provided by an AI system infused with the knowledge that highlights concept phrases in the input text. These concept phrases are used to traverse a KG, which in Figure 2 is SNOMED-CT. Along with Figure 1 which explains process knowledge infusion contributing to a reasonable path toward classification, Figure 2 provides the additional user-level explanation. Within the three forms of knowledge-infusion under KiL (i.e., shallow, semi-deep, and deep [3]), process knowledge infusion develops a new and complementary set of methods, datasets, and evaluation methods under semi-deep and deep knowledge infusion.

Furthermore, any AI systems trained with such methods and over such datasets can also handle uncertainty and risk. They can establish the connection between the input and output, answering, "why such an outcome, given an input?". The AI systems are context-sensitive rather than opinionated based on only the input data, i.e. a partial representation of the world. The structure and order provided by using process knowledge allow the end-users control over the AI system. Moreover, in-process knowledge in a particular domain and for a particular task (classification or generation), an AI system with a method that makes the model adaptable to process knowledge can make the system transferable across tasks. The subsequent sections will provide a concrete definition of process knowledge and its use in understanding and controlling AI models. With a focus on natural language generation (NLG), we will conceptually describe methods for infusing process knowledge into statistical AI systems. Thereafter, we provide use-cases in the domain of mental health (continuing with Figure1 and Figure 2) and cooking.

PROCESS KNOWLEDGE AND ITS INFUSION INTO STATISTICAL AI

Process knowledge is an ordered set of information that maps to evidence-based guidelines or categories of conceptual understanding to experts in a domain. For instance, The American Academy of Family Physicians (AAFP) develops clinical practice guidelines (CPGs) that serve as a framework for clinical decisions and supporting best practices. CPG allows systematic assessment to optimize patient care. On the other hand, the U.S. Departments of Agriculture (USDA) and Health and Human Services (HHS) develop Dietary Guidelines for Americans that serve as a recommendation for meeting nutrient needs, promoting health, and preventing disease. An AI system adapted to process knowledge can handle uncertainty in prediction, and the predicted outcomes are safe and user-level explainable. Further, an AI system can consider process knowledge as meta-information to capture the sequential context necessary for carrying out a structured conversation. Also, it allows the developer of the AI system to probe the internal decision-making of AI systems using application-specific guidelines or specifications that inform the synchrony between the end-users thought process and the model’s functioning.

No alt text provided for this image

This unique form of knowledge differs from other forms of knowledge in the following manner: (a) KG: it is structured but not ordered. KGs can support context capture but cannot enforce conceptual flow. (b) Semantic lexicons: this is a flattened form of KG that makes deep language models context-sensitive and add constraints but cannot enforce conceptual flow [4]. (c) Ontologies are curated schematic forms of knowledge graphs with classes, instances, and constraints. Thus, ontologies can provide stricter control over context and constraints. If defined, an ontology can enforce order in question generation using deep language models [5]. Process knowledge is represented differently for different applications. For instance, to assess the severity of suicide risk, the process knowledge used is C-SSRS, which is similar to a flow chart. On the other hand, the GAD-7-based process knowledge is used to assess anxiety severity which has a flattened structure (Figure [3]). DASH Diet-based process knowledge can be used to assess the dietary intake of hypertension patients and also recommend meals. These characteristic properties of process knowledge and its infusion into statistical AI would yield a new class of neuro-symbolic algorithms that would drive the question:

?What if we could use the annotator’s labels and the process or guidelines used to label them and explicitly control the learning of a model to recover the guideline or process (instead of implicitly).

Such an algorithm would, by design, be explainable and emulate the human model of similarity between data points. For the task of classification, a process knowledge-infused AI system would solicit the use of interpretable machine learning algorithms (e.g., Decision Trees, Random Forest) that can enforce structure in decision making over traditional deep language model-based classification.

In NLG, the biggest concern with deep generative language models is that they hallucinate when either asking questions or providing responses in a conversational setting. Along with the issue of hallucination, there has been an extensive study about the inappropriate and unsafe risk behaviors of language models. Efforts to pair these language models with passage retrievers and rankers have been proposed to control incoherent, irrelevant, and factually incorrect responses and questions; however, the order, like the one defined in process knowledge, is far from being realized [6]. Such process knowledge-based NLG is even more crucial in the field of healthcare NLP, where each response from the agent can have severe consequences. These concerns are further discussed with the help of use cases in two domains: Mental Health and Food/Nutrition.

MENTAL HEALTH USE CASE

AI has contributed to the domains of drug research, customized medicine, and patient care monitoring and has the potential to aid physicians in making better diagnoses. However, when AI is used in health care, various dangers and problems might arise at the individual, macro, and technological levels (e.g., awareness, education, trust), as well as at the macro-level (e.g., legislation and rules, risk of accidents due to AI faults) (e.g., usability, performance, data privacy, and security). In the context of mental healthcare, conversational agents are prone to unsafe generations that can harm the user or engage in a conversation involving escalation in the severity of medical conditions.

No alt text provided for this image

Figure 4 illustrates a pipeline wherein (A) the deep statistical language model pre-trained on open domain corpus when tasked to converse with a user in a mental healthcare setting generates questions that it sees online. (B) Such questions are not what a mental healthcare provider would ask. If we utilize a clinical guideline, in this case, C-SSRS, the model can measure the safety of the generated question before asking. (C) Figure 4 shows a process over the detailed process knowledge that an AI agent followed to control its question generation and ask medically correct questions. A recent study from Roy et al. details this approach using C-SSRS, and Gupta et al. detail this approach using GAD-7 and PHQ-9, which are clinical guidelines to check whether the user is a patient of an anxiety disorder (GAD-7) or clinical depression (PHQ-9) [7][8].

PROCESS KNOWLEDGE AS CONSTRAINTS

Some more ways in which process knowledge can be infused to add constraints and improve NLG of the current AI methods are:

No alt text provided for this image

  • Textual Entailment Constraints (TEC) is a directional relationship between sentences in a response or questions. If the two sentences share semantic relations and logically agree, they are entailed. If the two sentences are synonymous based on the entities they contain, they are neutral. If the second sentence refutes the information in the first sentence, they are contradictory. Such constraints are manifestations of process knowledge in clinical practice. In machine-understandable form, we can model them as Rules containing Tags and Rank (see Figure 5).

No alt text provided for this image

FOOD AND NUTRITION USE CASE

A conversational system to manage diet can help patients in various applications such as hypertension and diabetes [9]. In most of the scenarios the interactions between user and system involve factual queries (e.g., Can you order a falafel for me? What are the sides offered with falafel? etc.). However, the challenge lies in how recommendations can be adapted to user preferences and context. It is still an open question [10]. Further, how can recommendations be provided when users do not ask factual questions (e.g., Can you suggest some food that helps me control my calorie intake? I want to lose weight. What should I eat for lunch?). In the case of Hypertension, patients need a nudge to switch toward healthy food habits. A nutrition management system can aid and assist them in this process. In such a scenario, when a user asks the following question to an agent: “Can you recommend dishes that are calorie efficient?” if the agent is augmented with the internet, it would accurately respond to the following related questions (or people-also-ask questions) : (a) “Are restaurants required to put calories on menus,” (b) “Are calorie recommendations accurate,” (c) “Should I eat less than my recommended calories?”, and (d) “What food can you recommend?”. Moreover, the top two searches on Google are (a) Cut lots of calories and (b) How to lose weight eating more food, which is not relevant to the user query. There are two fundamental problems here: (a) The AI system behind these recommendations is confused about whether “calorie efficiency” is positive or negative. (b) The AI system fails to bridge the gap between dishes and calorie efficiency. Furthermore, a response to such a question is dependent on the time of the day: breakfast, lunch, or dinner. A process knowledge-based conversational agent would generate the following information-seeking questions: (a) Do you have any preference in cuisine? (b) Do you want to know about low-calorie food in this cuisine for breakfast/lunch/dinner? (c) Do you want me to book reservations for restaurants that have this cuisine? (d) Do you want me to save your preferences? If the answer to (b) question is no, then an alternate path in process knowledge is triggered. Here, process knowledge is the procedure for recommending and ordering food. Moreover, the agent can benefit from the 2015-2020 Dietary Guidelines for Americans to emphasize overall healthy eating patterns supported by five food groups: fruits, vegetables, grains, protein foods, and dairy.

Similarly, for type-I Diabetes, patients need to monitor carbohydrate (CHO) intake for their insulin dosage, hence the source of CHO determines whether a given food item is advisable. CHO count due to the fibers present in vegetables and fruits are considered healthy, whereas CHO from added sugars, white rice, and pasta are considered unhealthy. Existing models advise meals based on the daily value of the CHO limit of an individual. In this case, the CHO count of a recipe derived from added sugar will be recommended by the agent if it is within the daily CHO limit of an individual. This can have severe effects on an individual's health. By infusing the process knowledge of diabetic dietary guidelines into the learning process, the agent can learn to advise appropriate meals and generate explanations to enhance interpretability and safety (see Figure 6).

No alt text provided for this image

Along with the above two scenarios, the nutrient content of the food change based on the adverse effects of the cooking actions on the final cooked food item such as nutrition loss or the introduction of harmful elements. To add, the dietary restrictions for each chronic disease have respective guidelines. Hence in this scenario, two kinds of process knowledge, adverse effects of cooking actions combined with ingredients and dietary guidelines for chronic conditions, are involved in generating explanations, improving interpretability and safety of food recommendation agents.

PROCESS KNOWLEDGE AS CONSTRAINTS

In addition to specific dietary guidelines for chronic conditions, cooking actions produce adverse effects. For example, the ingredients for potato fries involve potatoes, oil, salt, pepper, and other seasonings. These are advisable ingredients as per dietary guidelines for diabetes. However, the cooking action is deep-frying which produces trans-unsaturated fatty acids. The trans-unsaturated fatty acids (see also) are not advisable for any chronic diseases and the general population as well.

Similarly, grilling a slice of meat can introduce carcinogenic agents [11] due to the animal fat dripping onto direct heat. However, grilling vegetables and fruits do not produce carcinogenic agents. The process knowledge of cooking actions can aid the agent in learning general adverse effects due to specific combinations of cooking actions and ingredients. This process knowledge can aid the agent in learning to nudge any user towards healthy eating habits irrespective of the dietary guidelines for various chronic diseases. An agent learned by infusing the two kinds of process knowledge will be able to generate explanations, be interpretable, and thereby improve the safety aspect of meal advice.

NEED FOR NEW EVALUATION METRICS

The precision of AI is not always a good indicator of clinical effectiveness. The area under the receiver operating characteristic curve (AUROC), another frequent metric, is not always the ideal indicator for clinical application. Such AI measures may be complex for physicians to comprehend or may not be clinically relevant. Furthermore, AI models have been assessed using a range of indices, including the F1 score, accuracy, and false-positive rate, which are indicators of distinct elements of AI's analytical ability. Understanding how complicated AI works necessitates a level of technical understanding not commonly seen among physicians.

AI models with process knowledge infusion require specialized metrics for evaluating the performance concerning safety and uncertainty, and risk handling. For instance, Stanford natural language inference, Multi-genre natural language inference, and other similar datasets can be used to create a learned evaluation metric to assess safety in generation by comparing the generated hypothesis with a premise. In essence, safety and uncertainty and risk handling would require human evaluation, which is a mandate; these metrics are also equally important as they either involve: (a) annotators’ agreements/disagreements, (b) knowledge source, and (c) train deep language models on datasets that have data samples ordered by some relationships [12][13].

  1. Average Number of Unsafe Matches: This represents the average number of matches across all model-generated questions against a set consisting of utterances, lexical content, or ontology concepts used to describe harmful communication. Such a measure provides a range of means to impose safety checks that can be extracted from unstructured, semi-structured, and structured sources and domain experts. For example, named entities in the generated content could match against harmful concepts in a knowledge base or in a lexicon set containing harmful phrases (unigrams, bigrams, and trigrams).
  2. Perceived Risk Measure: This is an annotator-in-the-loop metric to judge the model's stability in light of agreement and disagreement between the annotators, a notion of uncertainty and safety. It is composed of two components: (a) Penalty: A ratio of the count of misclassified samples over the count of those samples where the annotators disagree with each other. (b) Benefit: A ratio of the count of samples where the model's predicted label agrees with some annotators (ignoring the disagreement between them) over the total number of annotators. Such a metric is efficient for controlling unsafe predictions as opposed to using statistical loss functions that quantify uncertainty in predictions and overwhelm the experts in the loop with re-annotations [14].
  3. Semantic Relations and Logical Agreement Measures: These are trained metrics constructed using the RoBERTa model, a deep language model trained independently on sentence similarity and natural language inference GLUE tasks. These metrics have been introduced in a recent study by Gaur et al. that unites meta-information-guided passage retrievers and TEC for inducing logical ordering in the generations and preventing retrieval-augmented language models from hallucinations [15]. Semantic Relation is a metric that counts the number of generations semantically similar to a user query over the total number of generations. The logical agreement score records the count when the current generated question entails a previously generated question. The score takes the sum of such counts and divides them by the number of generations.

SUMMARY AND FUTURE DIRECTIONS

Real-world interactions between the users are not a single shot activity but rather a chain of exchanges involving procedural questions and responses; at a macroscopic level and the microscopic level, it comprises entities and actions that keep changing during a task-oriented conversation. This phenomenon can be well understood and controlled through a process of knowledge that represents a human’s mental model of conversation. In this article, through example use-cases in mental healthcare and food, we explained the notion of process knowledge that naturally concerns consistency, explainability, and interpretability in AI’s decision-making process. To the best of our knowledge, this article projects its role in pushing statistical AI to be safe, less uncertain, and risky in its classification and natural language generation tasks. With process knowledge, the AI model can support reasoning, which is essential to develop trust in stakeholders using the application in various downstream tasks. We showed various existing process knowledge and the methods that data-driven AI models can use. As a future direction, we envision the utility of process knowledge in personalization which is essential in developing interventional plans for patients with other mental health disorders (e.g., autism, aphasia) and developing food plans for patients with specific dietary needs.

REFERENCES

[1] A. Sheth, M. Gaur, K. Roy, and K. Faldu, “Knowledge-intensive language understanding for explainable AI,” IEEE Internet Computing, 25 (5), pp. 19–24, 2021.

[2] R. D. Newman-Norlund, S. E. Newman-Norlund, S. Sayers, S. Nemati, N. Riccardi, C. Rorden, and J. Fridriksson, “The aging brain cohort (abc) repository: The University of South Carolina’s multimodal lifespan database for studying the relationship between the brain, cognition, genetics and behavior in healthy aging,” Neuroimage: Reports, vol. 1, no. 1, p. 100008, 2021.

[3] A. Sheth, M. Gaur, U. Kursuncu, and R. Wickramarachchi, “Shades of knowledge-infused learning for enhancing deep learning,” IEEE Internet Computing, vol. 23, no. 6, pp. 54–63, 2019.

[4] G. Libben, “From lexicon to flexicon: The principles of morphological transcendence and lexical superstates in the characterization of words in the mind,” Frontiers in Artificial Intelligence, vol. 4, 2021.

[5] K. Stasaski and M. A. Hearst, “Multiple choice question generation utilizing an ontology,” in Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017, pp. 303–312.

[6] M. Glass, G. Rossiello, M. F. M. Chowdhury, A. R. Naik, P. Cai, and A. Gliozzo, “Re2g: Retrieve, rerank, generate.” NAACL 2022 Conference Blind Submission, May 2022.

[7] K. Roy, M. Gaur, Q. Zhang, and A. Sheth, “Process knowledge-infused learning for suicidality assessment on social media,” arXiv preprint arXiv:2204.12560, 2022.

[8] S. Gupta, A. Agarwal, M. Gaur, K. Roy, V. Narayanan, P. Kumaraguru, and A. Sheth, “Learning to automate follow-up question generation using process knowledge for depression triage on Reddit posts,” Computational Linguistics and Clinical Psychology Workshop (CLPsych), July 2022.

[9] S. Yagcioglu, A. Erdem, E. Erdem, and N. Ikizler-Cinbis, “Recipeqa: A challenge dataset for multimodal comprehension of cooking recipes,” arXiv preprint arXiv:1809.00812, 2018.

[10] F. Pecune, L. Callebert, and S. Marsella, “Designing persuasive food conversational recommender systems with nudging and socially aware conversational strategies,” Front. Robot. AI 8: 733835. doi: 10.3389/frobt, 2022.

[11] A. J. Cross and R. Sinha, “Meat-related mutagens/carcinogens in the etiology of colorectal cancer,” Environmental and molecular mutagenesis, 44(1), pp. 44–55, 2004.

[12] A. Williams, N. Nangia, and S. R. Bowman, “A broad-coverage challenge corpus for sentence understanding through inference,” arXiv preprint arXiv:1704.05426, 2017.

[13] O.-M. Camburu, T. Rockt¨aschel, T. Lukasiewicz, and P. Blunsom, “esnli: Natural language inference with natural language explanations,” Advances in Neural Information Processing Systems, vol. 31, 2018.

[14] R. Sawhney, A. T. Neerkaje, and M. Gaur, “A risk-averse mechanism for suicidality assessment on social media,” Association for Computational Linguistics 2022 (ACL 2022), 2022.

[15] M. Gaur, K. Gunaratna, V. Srinivasan, and H. Jin, “Iseeq: Information seeking question generation using dynamic meta-information retrieval and knowledge graphs,” 36th AAAI Conference, Spring 2022 (arXiv preprint).

Acknowledgment: This work was supported in part by National Science Foundation (NSF) Award 2133842, “EAGER: Advancing NeurosymbolicAI with Deep Knowledge-infused Learning.”


Relevant prior publications (some of these are part of the IEEE IC department on knowledge graphs):

A. Sheth, M. Gaur, K. Roy, K. Faldu. "Knowledge-intensive Language Understanding for Explainable AI," IEEE Internet Computing, September/October 2021.?DOI Bookmark:?10.1109/MIC.2021.3101919

A. Sheth, K. Thirunarayan.?The duality of data and knowledge across the three waves of AI.?IT Professional,?23(3), 2021. 35–45.?https://doi.org/10.1109/MITP.2021.3070985

K. Faldu, A. Sheth, P. Kikani, H. Akabari.?KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding, arXiv:2104.08145, 09 Apr 2021.

M. Gaur, K. Faldu and A. Sheth, "Semantics of the Black-Box: Can Knowledge Graphs Help Make Deep Learning Systems More Interpretable and Explainable?" IEEE Internet Computing, 25 (1), pp. 51-59, 2021. doi:?10.1109/MIC.2020.3031769

M. Gaur, K. Faldu, A, Desai, A. Sheth.?Explainable AI using knowledge graphs.?ACM CoDS-COMAD Conference, Jan 2-4, 2021.?video

H. Purohit, V. Shalin, A. Sheth,?Knowledge graphs to empower humanity-inspired AI systems.?IEEE Internet Computing,?24(4), 48–54, 2020.?https://doi.org/10.1109/MIC.2020.3013683

S. Bhatt, A. Sheth, V. Shalin and J. Zhao, "Knowledge Graph Semantic Enhancement of Input Data for Improving AI," in?IEEE Internet Computing, vol. 24, no. 2, pp. 66-72, 1 March-April 2020, doi:?10.1109/MIC.2020.2979620.

U. Kursuncu, M. Gaur, M., A. Sheth,?Knowledge infused learning (K-IL): Towards deep incorporation of knowledge in deep learning, Proceedings of the AAAI 2020 Spring Symposium on Combining Machine Learning and Knowledge Engineering in Practice (AAAI-MAKE). March 2020.?PDF

A. Sheth, S. Padhee and A. Gyrard, "Knowledge Graphs and Knowledge Networks: The Story in Brief," in IEEE Internet Computing, vol. 23, no. 4, pp. 67-75, 1 July-Aug. 2019. doi:?10.1109/MIC.2019.2928449

Explore our work on Knowledge Graph

要查看或添加评论,请登录

社区洞察

其他会员也浏览了