Knowledge-intensive Language Understanding for Explainable AI

Knowledge-intensive Language Understanding for Explainable AI

This is a preprint of (cite as): Amit Sheth, Manas Gaur, Kaushik Roy, Keyur Faldu. "Knowledge-intensive Language Understanding for Explainable AI," IEEE Internet Computing, September/October 2021. DOI Bookmark:?10.1109/MIC.2021.3101919

AI systems have seen significant adoption in various domains. At the same time, further adoption in some domains is hindered by the inability to fully trust an AI system that it will not harm a human. Besides, fairness, privacy, transparency, and explainability are vital to developing trust in AI systems. As stated by IBM in describing trustworthy AI, "Trust comes through understanding. How AI-led decisions are made and what determining factors were included are crucial to understand." The subarea of explaining AI systems has come to be known as XAI.?Multiple aspects of an AI system can be explained; these include biases that the data might have, lack of data points in a particular region of the example space, fairness of gathering the data, feature importances, etc. However, besides these, it is critical to have human-centered explanations directly related to decision-making, similar to how a domain expert makes decisions based on "domain knowledge," including well-established, peer-validated explicit guidelines. To understand and validate an AI system's outcomes (such as classification, recommendations, predictions) that lead to developing trust in the AI system, it is necessary to involve explicit domain knowledge that humans understand and use. Contemporary XAI methods are yet addressed explanations that enable decision-making similar to an expert. Figure one shows the stages of adoption of an AI system into the real world.

Can the inclusion of explicit knowledge help XAI provide human-understandable explanations and enable decision-making?

Methods for Explainable AI: Opening the Black Box

The availability of vast amounts of data and the advent of deep neural network models have accelerated the adoption of AI systems in the real world, owing to their significant success in natural language processing, computer vision, and other data-intensive tasks. However, despite the advances in performance across these tasks, deep learning models remain a black box, i.e., it is extremely hard to understand how the inputs map to the outputs. Recent research in XAI has attempted to address several aspects of "opening this black box" to help humans, both the system users and domain experts, understand such models' functioning and decision-making process [1].

No alt text provided for this image

Figure 1: Adoption of AI systems occurs in two stages - the model building phase and the model explanation phase. Explicit knowledge as abstract concepts, processes, policy/guidelines, and regulations are essential to infuse into the AI system for sensible explanations comprehensible to humans.

We now describe and provide references for four main approaches in state-of-the-art XAI for natural language processing that generate explanations from low-level model features in the box below [2]:

__________________________________________________________________________Approaches and key references for?XAI in natural language understanding

First derivative Saliency based methods explain the decision of an algorithm by assigning values that reflect the importance of input features in their contribution to that decision in the form of a gradient map (heat map):

1.????Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should I trust you?" Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.

2.????Zafar, Muhammad Rehman, and Naimul Mefraz Khan. "DLIME: A deterministic local interpretable model-agnostic explanations approach for computer-aided diagnosis systems." arXiv preprint arXiv:1906.10263 (2019).

3.????Lundberg, Scott M., and Su-In Lee. "A unified approach to interpreting model predictions." Proceedings of the 31st international conference on neural information processing systems. 2017.

Layer-wise relevance propagation decomposes the prediction of a deep neural network for a specific example into individual contributions from sub-parts of the text:

1.????Montavon, Grégoire, et al. "Layer-wise relevance propagation: an overview." Explainable AI: interpreting, explaining and visualizing deep learning (2019): 193-209.

2.????Yang, Yinchong, et al. "Explaining therapy predictions with layer-wise relevance propagation in neural networks." 2018 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, 2018.

3.????Samek, Wojciech, et al. "Interpreting the predictions of complex ml models by layer-wise relevance propagation." arXiv preprint arXiv:1611.08191 (2016).

Input perturbations measure how input changes affect activations and features:

1.????Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should I trust you?" Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.

2.????Lundberg, Scott M., and Su-In Lee. "A unified approach to interpreting model predictions." Proceedings of the 31st international conference on neural information processing systems. 2017.

Attention models compute focus areas in the text during model decision-making:

1.????Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).

2.????Yang, Zichao, et al. "Hierarchical attention networks for document classification." Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. 2016.

3.????Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.

_______________________________________________________________________

Evaluation of Explanations

Prior research in assessing the quality of the explanations generated by the XAI system has utilized methods like majority voting over crowdsourcing, visual inspection, annotator agreements, etc. These evaluation metrics are intuitive, but they relegate domain experts to mere annotators of the AI system. Developing a good quality XAI system requires domain experts in the annotation, supervision, and evaluation phases [13,14]. For this purpose, domain experts need explanations in the form of an expert working in that domain, or that application would give, using the language and concepts customarily employed by a person working in that field. For example, in the medical domain, the outcome of a model needs to be explained by positioning against conceptual knowledge contained in clinical guidelines. Analysis of word-level and token-level features is of little to no use to a domain expert during evaluation [6].?

Evaluation of the quality of explanations using the methods mentioned in the box above requires in-depth knowledge of the mathematical operations such as derivatives, layer-wise feature mapping, perturbations, attention mechanisms, and others. Giplin et al. present a survey of "explaining" explanations and show that human evaluators are needed to evaluate explanations produced by a model [7]. Due to the nature of the explanations, the current evaluation of the explanations is limited to analysis of the word and token level feature importance once a suitable visualization mechanism, such as a saliency map, is utilized. Notably, the mathematical expertise required to "open the black box" has been a critical bottleneck in adopting AI systems with explanations. Domain experts require explanations in a language they can easily comprehend or understand to evaluate the system. For example, in the medical domain, the outcome of a model needs to be explained by positioning against conceptual knowledge contained in clinical guidelines.

Domain-related concepts and clinical guidelines that utilize these concepts to enable outcomes and decisions are stored as explicit knowledge in knowledge graphs (KGs). Thus, methods that incorporate KGs to provide a conceptual level explanation of the model outcome could improve explanations and ease of evaluating AI systems. Furthermore, popular metrics in language understanding such as BLEU, ROUGE-L [3], QBLEU4 [4], BLEURT [5] need to be augmented to allow evaluation of the system's explicit knowledge guided decision-making capabilities. This will lead to trust in the systems by end-users and speedy adoption into the real world.

Explicit knowledge-based XAI methods enable trust by explaining the AI system's decision-making to the domain expert or end-user in a language and forms they can easily comprehend.

Techniques that use explicit knowledge to provide explanations to outcomes

Recent efforts in the deep learning and NLP community have focused on developing benchmark datasets that would require explicit knowledge [8]. For instance, consider the task of goal-oriented question generation where the goal is to meet the end-user's information-seeking behavior. In such a task, the user provides a query: "Tell me about the tourism and transportation in Hyderabad." Leveraging a pre-trained T5 model would generate the following question: "What is tourism" or "What is transportation in Hyderabad" which are not interesting or relevant to the end-user. In such a task, the end-user is seeking information on tourism and transportation entities within Hyderabad. Therefore, the deep neural network must develop a good passage retriever and question generator module to obtain contextual questions. This is because the answer lies in a separate but semantically connected passage. Moreover, such a task might require retrieval of a large number of relevant passages as the query is open domain and not factoid. For example, a subsequent question, "Tell me about tourism in the Charminar in Hyderabad," will go through the links in Hyderabad, followed by Tourist attractions in Hyderabad, and ending at the article on Charminar (concept flow). This is known as multihop open question answering (ODQA).?A developed retriever/generator pipeline would help end-users realize and reason over the questions generated by the model through the support of relevant passages [11]. Likewise, the same model could utilize clinical questionnaires with definitions or a clinical manual (such as Diagnostic and Statistical Manual of Mental Disorders (DSM-5) or Structured Clinical Interview for DSM-5) to generate relevant questions [12]. Figure 2 shows another example where concept flow occurs in the healthcare domain.?The example shows that there is a need for a clear explanation on how the question answering takes place that makes sense to the domain expert. This allows the domain expert to evaluate the questions that lead to trust in the system.

No alt text provided for this image

Figure 2: Concept Flow-based Question Generation. Left is generated from a pre-trained T5 fine-tuned for question generation. Right is generated using a T5 fine-tuned on relational context (question and answers) under the supervision of ConceptNet. The difference in the two multi-turn question-answers is that (b) has context-specific questions that drill down from high-level questions to problem-focused questions.

From General Language to Knowledge-Intensive Language Understanding

The NLP community has set up a set of tasks across various benchmark datasets called General Language Understanding Evaluation (GLUE) tasks. They test a variety of natural language tasks such as textual entailment, textual similarity, and duplicate detection. However, recent research has shown that such tasks do not require external knowledge as most tasks are close domain. For example, open domain question answering requires external knowledge to narrow down the scope of passages where the answer lies. Such tasks are known as Knowledge Intensive Language Understanding (KILU). KILU is a new unified benchmark to help AI researchers build models that are better able to leverage real-world knowledge to accomplish a broad range of tasks. Models that are better able to leverage real-world knowledge do well in these tasks [9][15]. Traditional explanation methods focus on GLUE tasks. However, since GLUE tasks don't test if the model can leverage knowledge, the explanations generated are of limited utility to humans. As explained below, this requires abstraction, contextualization, personalization, and a variety of knowledge sources to capture information similar to how a human does.

To provide explanations to KILU tasks, the model should leverage explicit knowledge and perform abstraction, contextualization, personalization and utilize a variety of knowledge sources.

The use of explicit knowledge in providing explanations achieves the following key capabilities:

Abstraction - The task of mapping low-level features to higher-level human-understandable abstract concepts is known as abstraction. Humans often speak in terms of higher-level abstract concepts when explaining their decision to a user. AI systems also need to explain decisions to the end users using abstract domain-relevant concepts constructed from low-level features and external knowledge in a KG.

Contextualization - Contextualization is interpreting a concept with reference to relevant use or application. Domain experts contextualize the problem within the domain of a particular disease, for example, depression with its common symptoms and medications. This enables better decision making such as more accurate treatments.

Personalization - Identifying data point-specific information and integrating it with external knowledge to construct a personalized knowledge source is known as personalization. For example, a person's depressive disorder can be due to family issues, relationship issues, and clinical factors. These affect the context-specific to the individual and consequently affect his symptoms and medications differently than that for another person.

No alt text provided for this image

Figure 3: Different sources of knowledge. GLUE tasks are evaluated using BERT-score, GLUE score, ROUGE-L, BLEU, BLEURT, NUBIA metrics [9]. However, for knowledge-intensive knowledge understanding, evaluations require domain knowledge-guided explanations.

Variety of knowledge capture that humans utilize - Humans conceptualize by processing information through different levels of knowledge at varying levels of abstractions. Figure 3 shows the different types of knowledge that humans use, including but not limited to syntactic, structural, linguistic, common-sense, general, and domain-specific knowledge. Figure 4 shows an example of how humans process information by performing personalization through stored historical interactions with the system, contextualization via various sources of knowledge, and abstraction through a target source understandable to the end-user. In addition, attempts have been made to infuse knowledge from multiple knowledge graphs to improve domain understanding [15].??

No alt text provided for this image

Figure 4: The background knowledge in the form of discharge summaries, transcripts of clinical diagnostic interviews, and filled clinical questionnaires (e.g., PHQ-9) created a personalized profile of a user suffering from Major Depressive Disorder (MDD) (Step 1). Next, domain knowledge abstracts their attributes to reveal insomnia (Step 2). Finally, the clinical guidelines (process knowledge) use information about their insomnia to make a recommendation (Step 3).

Conclusion

Recent progress in XAI to explain black-box models largely focuses on explanations that map low-level model features to explanations on model decisions or describe the computational paths such as deep network activations that lead to model outcomes. These "system-oriented explanations" do little for a domain expert or an end-user who needs to be able to trust the AI system's decision-making process and its adherence to real-world processes, rules, and guidelines. For this, the XAI needs to offer explanations that the end-user or domain expert can easily comprehend. However, a user does not think in terms of low-level features, nor does he understand the inner workings of an AI system. Instead, he thinks in terms of abstract, conceptual, process-oriented, and task-oriented knowledge external to the AI system. Such external knowledge also needs to be explicit (e.g., as modeled by a knowledge graph), not implicit (i.e., implied by statistics or a vector representation). Recent efforts in knowledge-infused learning [10], a form of neuro-symbolic AI that utilized explicit external and usually human-curated knowledge, can generate reasonable explanations for users who want to trust an AI system. Thus, explicit external knowledge must be infused into a black-box AI model to generate explanations from low-level features that the domain expert or end-user can understand (see Figure 4). This article also shows the need to develop better natural language understanding benchmarks beyond GLUE that can effectively test the ability of the AI system to explain decisions in a human-understandable manner.

Acknowledgments: This research is support in part by National Science Foundation (NSF) Award #2133842, "EAGER: Advancing Neuro-symbolic AI with Deep Knowledge-infused Learning." Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.?

References:

  1. Manas Gaur, Keyur Faldu, and Amit Sheth. "Semantics of the Black-Box: Can knowledge graphs help make deep learning systems more interpretable and explainable?." IEEE Internet Computing 25.1 (2021): 51-59.
  2. Enrico Tjoa and Cuntai Guan. "A survey on explainable artificial intelligence (xai): Toward medical xai." IEEE Transactions on Neural Networks and Learning Systems(2020).
  3. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. "Bleu: a method for automatic evaluation of machine translation." Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 2002.
  4. Dan Su et al. "Multi-hop Question Generation with Graph Convolutional Network." arXiv preprint arXiv:2010.09240(2020).
  5. Sellam, Thibault, Dipanjan Das, and Ankur P. Parikh. "BLEURT: Learning robust metrics for text generation." arXiv preprint arXiv:2004.04696 (2020).
  6. Lewis, Patrick, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks." arXiv preprint arXiv:2005.11401 (2020).
  7. Leilani Gilpin, David Bau, Ben Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. "Explaining explanations: An overview of interpretability of machine learning." In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), 2018.
  8. Manas Gaur, Vamsi Aribandi, Ugur Kursuncu, Amanuel Alambo, Valerie L. Shalin, Krishnaprasad Thirunarayan, Jonathan Beich, Meera Narasimhan, and Amit Sheth. "Knowledge-Infused Abstractive Summarization of Clinical Diagnostic Interviews: Framework Development Study." JMIR Mental Health 8, no. 5 (2021): e20865.
  9. Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Guihong Cao, Daxin Jiang, and Ming Zhou. "K-adapter: Infusing knowledge into pre-trained models with adapters." arXiv preprint arXiv:2002.01808 (2020).
  10. Amit Sheth, Manas Gaur, Ugur Kursuncu, Ruwan Wickramarachchi. Shades of knowledge-infused learning for enhancing deep learning. IEEE Internet Computing. 2019 Nov 1;23(6):54-63.
  11. Kevin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. "Realm: Retrieval-augmented language model pre-training." arXiv preprint arXiv:2002.08909 (2020).
  12. Manas Gaur, Ugur Kursuncu, Amanuel Alambo, Amit Sheth, Raminta Daniulaityte, Krishnaprasad Thirunarayan, and Jyotishman Pathak. ""Let Me Tell You About Your Mental Health!" Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention." In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 753-762. 2018.
  13. Kaushik Roy, Qi Zhang, Manas Gaur, and Amit Sheth. "Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits." To Appear in the Proceedings of the 19th European Conference on Machine learning and Principles and Practice of Knowledge Discovery in Databases (2021).
  14. Manas Gaur, Kaushik Roy, Aditya Sharma, Biplav Srivastava, and Amit Sheth. ""Who can help me?": Knowledge Infused Matching of Support Seekers and Support Providers during COVID-19 on Reddit." Proceedings of the 6th IEEE International Conference on Healthcare Informatics,?2021.
  15. Keyur Faldu, Amit Sheth, Prashant Kikani, and Hemang Akabari. "KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding." arXiv preprint arXiv:2104.08145 (2021).

?Authors:

Amit Sheth is the founding director of the Artificial Intelligence Institute at the University of South Carolina (aiisc.ai, #AIISC). He is a fellow of IEEE, AAAI. AAAS, and ACM.

?Manas Gaur is a Ph.D. Student advised by Prof. Sheth at AIISC focusing on Knowledge-infused Learning.

?Kaushik Gaur is a Ph.D. student advised by Prof. Sheth at AIISC, focusing on AI algorithms used in health, social media analysis, and recommendation systems.?

?Keyur Faldu is the Chief Data Scientist at Embible, Inc., India.


Relevant prior publications:

A. Sheth, K. Thirunarayan. The duality of data and knowledge across the three waves of AI.?IT Professional,?23(3), 2021. 35–45.?https://doi.org/10.1109/MITP.2021.3070985

K. Faldu, A. Sheth, P. Kikani, H. Akabari. KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding, arXiv:2104.08145, 09 Apr 2021.

M. Gaur, K. Faldu and A. Sheth, "Semantics of the Black-Box: Can Knowledge Graphs Help Make Deep Learning Systems More Interpretable and Explainable?" IEEE Internet Computing, 25 (1), pp. 51-59, 2021. doi: 10.1109/MIC.2020.3031769

M. Gaur, K. Faldu, A, Desai, A. Sheth. Explainable AI using knowledge graphs.?ACM CoDS-COMAD Conference, Jan 2-4, 2021. video

H. Purohit, V. Shalin, A. Sheth, Knowledge graphs to empower humanity-inspired AI systems.?IEEE Internet Computing,?24(4), 48–54, 2020.?https://doi.org/10.1109/MIC.2020.3013683

S. Bhatt, A. Sheth, V. Shalin and J. Zhao, "Knowledge Graph Semantic Enhancement of Input Data for Improving AI," in?IEEE Internet Computing, vol. 24, no. 2, pp. 66-72, 1 March-April 2020, doi: 10.1109/MIC.2020.2979620.

U. Kursuncu, M. Gaur, M., A. Sheth, Knowledge infused learning (K-IL): Towards deep incorporation of knowledge in deep learning, Proceedings of the AAAI 2020 Spring Symposium on Combining Machine Learning and Knowledge Engineering in Practice (AAAI-MAKE). March 2020. PDF

A. Sheth, S. Padhee and A. Gyrard, "Knowledge Graphs and Knowledge Networks: The Story in Brief," in IEEE Internet Computing, vol. 23, no. 4, pp. 67-75, 1 July-Aug. 2019. DOI: 10.1109/MIC.2019.2928449

A. Sheth, M. Gaur, U. Kursuncu and R. Wickramarachchi, "Shades of Knowledge-Infused Learning for Enhancing Deep Learning" in IEEE Internet Computing, vol. 23, no. 06, pp. 54-63, 2019. doi: 10.1109/MIC.2019.2960071



Sumitra Biswal

Researcher and evangelist in intersection of Cybersecurity, AI, and Quantum Computing | Career mentoring volunteer

3 年

Many many congratulations to all of you!

Vijay Bhatt

Co-founder and CEO eContek

3 年

Compliments to Dr. Amit Sheth and the team of authors for an informative artcle.

Preetam Kumar

Professor at Indian Institute of Technology, Patna

3 年

Congratulations to all authors!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了