Shades of Knowledge-Infused Learning for Enhancing Deep Learning
Amit Sheth
NCR Chair & Prof; Founding Director, AI Institute at University of South Carolina
This is a version of the article (cite as): Amit Sheth, Manas Gaur, Ugur Kursuncu, Ruwan Wickramarachchi, Shades of Knowledge-Infused Learning for Enhancing Deep Learning, IEEE Internet Computing, 23 (6), Nov/Dec 2019, pp. 54-63. DOI: 10.1109/MIC.2019.2960071
ABSTRACT
Deep Learning has already proven to be the primary technique to address a number of problems. It holds further promise in solving more challenging problems if we can overcome obstacles such as the lack of quality training data and poor interpretability. The exploitation of domain knowledge and application semantics can enhance existing deep learning methods by infusing relevant conceptual information into a statistical, data-driven computational approach. This will require resolving the impedance mismatch due to different representational forms and abstractions between symbolic and statistical AI techniques. In this article, we describe a continuum that comprises of three stages for infusion of knowledge into the machine/deep learning architectures. As this continuum progresses across these three stages, it starts with shallow infusion in the form of embeddings, and attention and knowledge-based constraints improve with a semi-deep infusion. Toward the end reflecting deeper incorporation of knowledge, we articulate the value of incorporating knowledge at different levels of abstractions in the latent layers of neural networks. While shallow infusion is well-studied and semi-deep infusion is in progress, we consider Deep Infusion of Knowledge as a new paradigm that will significantly advance the capabilities and promises of deep learning.
Introduction
For many, the purpose of Artificial intelligence (AI) has been to achieve human-level intelligence. In that direction, recent years have seen data-driven machine learning (ML) models, specifically neural networks, acquiring remarkable success in an increasing number of tasks such as object detection in images and speech recognition. On the other hand, these approaches proved to be limited in their ability to perform the tasks with generality, adaptability, explainability, towards pursuing “machine intelligence”. As the dependence over large datasets is critical, the challenge is more acute since there is a lack of adequate and high-quality labeled data. Moreover, such a dataset may not cover all possibilities concerning the task in question, including those likely to arise in the future. In natural language understanding (NLU), for example, algorithms have not yet progressed to capture the implicit contextual meaning of the content. One approach to address such limitations and make intrinsically more intelligent systems is to combine the bottom-up data-dependent processing with top-down processing, as observed by cognitive scientists and to a lesser extent by computer scientists (Sheth et al. 2017; Yang et al. 2017). The blending of deep/machine learning with structured knowledge (e.g., knowledge graphs) which we call “Knowledge-Infused Learning” (Kursuncu et al 2019, Kursuncu et al 2020), is an approach to address challenges such as: (1) decreasing the dependence on large datasets, (2) reducing bias in the dataset, (3) providing ability to trace information allowing explainability of a model, (4) improving the search space for information specific to a domain since anomalies, irregularities and edge cases for which there may not be a large dataset to learn from, (5) reducing the complexity of model architecture, and (6) reducing false alarm in performance of a model. There have been early attempts at using external knowledge in machine learning to address these challenges; however, there is a long way to go to achieve true potential.
In the past decade, as symbolic or logical approaches to AI garnered substantial research attention, significant advances have come from statistical learning approaches. While these approaches were seen as complementary to each other (Sheth et al. 2005), their integration in one computational framework will be pivotal for pursuing machine intelligence with increased generality, adaptability, and explainability. This also has the potential of better supporting integrated top-down and bottom-up processing that the human brain appears to do well so seamlessly. Building upon the prior observations on the importance of knowledge in learning (e.g., data alone is not enough (Domingos 2012), the machine will propel machine understanding (Sheth et al. 2017), we posit that knowledge, nowadays represented as knowledge graphs (KGs), will be the key enabler.
Learning the underlying patterns in the data goes beyond instance-based generalization to some external knowledge represented in structured graphs or networks. Deep Learning (DL) has shown significant advances in improving natural language processing (NLP) by probabilistically learning latent patterns in the data using a multi-layered network of computational nodes (i.e. neurons/hidden units). However, with the tremendous amount of training data, uncertainty in generalization on domain-specific tasks, and miniscule improvement with an increase in complexity of models seem to raise a concern on the features learned by the model. The utilization of relevant knowledge will aid in supervising the learning of features and facilitate explainability. The next opportunity could be to complement the implicit knowledge by KGs that already provide explicit representation with entities along with their synonyms and variants and a variety of typed relationships. Many challenges remain, such as how to represent the knowledge propagation between nodes as complex real-world relationships in a graph. Pioneers in AI are hence manipulating the structured KGs for DL with relational inductive biases (zd.net/2Jblg2A), transfer learning (inter-domain knowledge sharing) and other new methods of infusing KG into ML.
KGs will play an increasing role in developing hybrid neuro-symbolic systems (that is bottom-up deep learning with top-down symbolic computing) as well as in building explainable AI systems for which KGs will provide scaffolding for punctuating neural computing.”
Considering the challenging task of NLU which requires deciphering the unique language, semantic and contextual characteristics, incorporating domain-specific knowledge resources, a context-aware and knowledge-enhanced computational approach will break down the content into contextual building blocks that acknowledge inherent ambiguity and sparsity. To show the efficiency of such an approach, we utilized social media data (e.g. Reddit) on mental health to classify users to one of DSM-5 categories. The system showed the capability of matching the patients to mental health professionals. Our approach utilizes a zero-shot learning approach and publicly available medical knowledge graph to learn a weight matrix for modulating word vectors. Evaluations show that this approach reduces the false alarm in the classification of mental health disorders by 91% (https://bit.ly/2qU8MY1). Ananthram et al. have utilized transportation-related ontologies to annotate events on traffic, public safety, and weather streaming as observation from citizens. The approach showed the benefits of ontologies in improving the learning performance of probabilistic graphical models (Anantharam et al. 2015).
As the infusion of knowledge in ML/DL algorithms can be at different levels of depth, we provide an overall taxonomy for knowledge infusion categorized as shallow, semi-deep, and deep infusion. We discuss each of these categories in the subsequent sections with examples.
Shallow Infusion of Knowledge
We define the first category of knowledge infusion i.e. shallow infusion as any attempt that either completely disregards the structured knowledge or transforms them into flattened intermediate forms when used with DL models. The two popular choices in capturing background information are (1) training a shallow neural architecture or a statistical model on a large corpus and feeding the learned statistical signature as an input to a task-specific model (Baroni et al. 2014) or (2) making the task-specific model objective directly aware of any such background information (Xu et al. 2018). Specifically, shallow infusion does not require the learning model to be significantly changed to ingest the external information. Rather, the external knowledge is introduced as a pre-trained model or weight vectors that can be directly fed or coupled with existing neural architectures. Hence we point out that, in shallow infusion, both the information fed to a model and the method of feeding information are shallow. We highlight three alternatives from the NLP domain as shown in Figure 1 followed by discussions.
Figure 1: This figure shows a chronological arrangement of the existing work from the NLP domain into three paradigms by considering the degree of information captured by each model; (1) word embeddings, (2) enriched word embeddings using additional information, (3) Deep neural language models. Given the rapid progress in this area, we likely have not included all possible examples for 2019.
Word Embeddings: This is the simplest form of shallow infusion. Here, the objective is to provide the model with “background” that the training data alone could not provide. The background information is available as large text corpora (for example GloVe is trained on 6B tokens) and a shallow neural network or a statistical model is trained in an unsupervised setting to capture the domain-specific meanings of words. The popular examples include but are not restricted to Word2Vec (skip-gram and CBOW algorithm) and GloVe. The representation of words as n-dimensional vectors (e.g., n=300) makes them easily transferable and task-agnostic within a particular domain. As a result, numerous pre-trained word embeddings are available for many languages (https://bit.do/multi-lang) and domains (https://bit.do/bionlp).
In shallow infusion, both the external information and method of knowledge infusion is shallow.
Enriched word embeddings: In this class of algorithms, the pre-trained word embeddings are enriched using additional information such as domain-specific lexicons/taxonomies and morphology of words. As a post-processing technique, “retrofitting” leverages semantic lexicons such as WordNet in modifying the embeddings. For example, retrofitting enforces the embedding of the word “incorrect” to be in a similar vicinity to other related words such as “wrong”, “flawed” and “false” in the embedding space. “Counter-fitting”, an approach similar to retrofitting, introduces synonymy and antonymy constraints to the word-relatedness when refining word embeddings. As a result, it prevents the word “inexpensive” to be closer to words such as “pricey” and “costly” even though they are related via an antonymy relation. FastText leverages information within the text to improve the learned embeddings. It considers morphology of words -- particularly, sub-word information -- and represents a word as a bag of character n-grams in learning the embeddings. This allows misspelled words, rare words, and abbreviations to have a similar meaning to their original forms. Moreover, this further enables deriving embeddings for words that did not appear in the training data.
Deep Neural Language Models: The primary difference in this class of models is the use of deep neural architectures with language modeling objective -- i.e., learning to predict the next word conditioned on the given context by probabilistically modeling words in a language. ELMo marks a significant step in this direction by capturing the “context” in which a word is used in a sentence. By training a task-specific Bi-LSTM network to model the language from both forward and backward directions, ELMo represents a particular word as a combination of corresponding hidden layers. The current state-of-the-art neural language modeling is inspired by the advent of Transformers -- a simple, solely attention-based mechanism that disregards the need of using recurrent and convolutional neural networks. Transformer based BERT, a model that broke records for several NLP tasks, learns to capture long term dependencies and context by training on large amounts of text. It further fine-tunes the knowledge gained, by specifically training on a supervised-learning task. Last year has seen ground-breaking works with several Transformer-based successors of BERT (e.g. RoBERTa, XLNet, and Transformer-XL) coming into light navigating the modern NLP to new directions.
Semi-Deep Infusion of Knowledge
We define the second category of knowledge infusion i.e. Semi-Deep Infusion as a paradigm that gauges the learning of a deep net and resolves the impedance mismatch by adding structural (e.g. dependency relations between words in a sentence) or symbolic (attention probability or constraints satisfaction) knowledge. Such an approach has been effective in a task-specific problem where the model is unable to learn complex representative features from the text (Ramakrishnan et al. 2008). Further, we noticed amalgamation of two deep learning networks is another alternative to bring together structural and sequential learning for improving the prediction (Yin et al. 2016). We categorize different perspectives of semi-deep infusion of knowledge in deep neural networks outlined for various NLP/NLU tasks (e.g. event detection, user classification, relationship extraction, reading comprehension, etc).
Figure 2: This figure shows an ordering of existing work that relates to our definition of Semi-Deep Infusion of Knowledge. We categorize the process of semi-deep infusion into three paradigms: (1) Forcing Methods, (2) Neural Attention Models, and (3) Knowledge-based models. Given the rapid progress in this area, we likely have not included all possible examples for 2019.
Teacher/Professor Forcing: In a deep learning framework comprising of an autoencoder, the capability of a decoder is enhanced through teacher forcing. In this procedure, the target labels (non-binary rather structured sentences) are fed word by word while training the decoder part of the autoencoder. The vectorized representation of the input on which decoder tries to learn is provided by the encoder. The procedure was first discussed by (Williams et al. 1989) and has shown improvement in machine translation, entity extraction, and negation detection tasks (Lamb et al 2016). Understanding the procedure of teacher forcing, we identified two critical issues: (1) the representation provided by the encoder is not gauged in the teacher forcing method, and (2) the model memorizes the patterns of the input and is difficult to perform transfer learning with the trained model. For example, consider learning of an autoencoder over “harassment dataset from social media” through teacher forcing, it is uncertain for the model to perform well on a near-related problem of “radicalization of social media”. It is because of poor contextualization and adaptability of the model. Kursuncu et al. leveraged domain-specific perspective models in enriching the representation of extremist’s communication on social media (Kursuncu et al. 2019). The approach provided the necessary knowledge required by a model to minimize the false alarm. In the context of the problem of “harassment on social media”, a potential improvement in a machine learning model has been made through the infusion of cyberbullying vocabulary knowledge.
A teacher forced model is able to learn the correct representation of the input through the below methods:
● Redundancy: In this learning process, the model is monitored for the information loss through backpropagation and is replenished through replicating the input to the layers. Methods like skip connections or highway connections follow such a methodology.
● Curriculum Learning: A variation of forced learning is to introduce outputs generated from prior time steps during training to encourage the model to learn how to correct its own mistakes.
In the teacher forcing paradigm, during inference, the conditioning context may diverge during training when ground truth labels are given as input. Since the encoder acts as a generator and decoder behave like discriminator, their independent functioning affects the model performance. Further, the incorporation of the knowledge is on the decoder side independent of the encoder. Hence, it is challenging to quantify the loss of information incurred on the encoder side. Our proposed approach on Deep Infusion regulates (1) Where in a model, the latent weights are wrongly enforced and (2) How to adjust the weights leveraging external human-curated graphical knowledge source.
"In semi-deep infusion, external knowledge is involved through attention mechanism or learnable knowledge constraints acting as a sentinel to guide model learning.”
Neural Attention Models (NAM): Attention models highlights particular features that are important for pattern recognition/classification based on a hierarchical architecture of the content. The manipulation of attentional focus is effective in solving real-world problems involving massive amounts of data (Chen et al. 2017). On the other hand, some applications demonstrate the limitation of attentional manipulation in a set of problems such as sentiment (mis)classification and suicide risk (Gaur et al. 2019), where feature presence is inherently ambiguous, just as in the radicalization problem. For example, in the suicide risk prediction task, references to the suicide-related terminology appear in the social media posts of both victims as well as supportive listeners, and the existing NAMs fail to capture semantic relations between terms to help differentiate the suicidal user from a supportive user. To overcome such limitations in a sentiment classification task, (Khuong Vo et al. 2017) have augmented sentiment scores in the feature set for enhancing the learned representation and modified the loss function to respond to the values of the sentiment score during learning. However, (Sheth et al. 2017) have pointed out the importance of using domain-specific knowledge especially in cases where the problem is complex. In an empirical study, Bian et al. showed the effectiveness of combining richer semantics from domain knowledge with morphological and syntactic knowledge in the text, by modeling knowledge assistance as an auxiliary task that regularizes learning of the main objective in a deep neural network (Bian et al. 2014).
Learnable Knowledge Constraints: Professor forcing forms an architecture where the encoder (generator) competes with the decoder (discriminator) in improving the outcome, thus forming an Adversarial Network. Further, the improvement in the learning occurs by acting as a posterior regularizer and allowing the possibility of including rich structured domain knowledge. However, in professor forcing, if knowledge constraints need to be infused, they need to be done apriori and not iteratively while learning. A recent study from Hu et al., focuses on infusing the knowledge as constraints in such an adversarial network by optimizing the Kullback-Leibler (KL) divergence (Hu et al 2018). However, the knowledge gathered for infusion is part of the dataset and does not exploit a human-curated Knowledge Graph. Further, the study relates to our objective by monitoring KL divergence. However, it does not show an appropriate methodology on adding the relevant knowledge which is quantified from the KL score. However, in our Deep Infusion paradigm (Figure 3), we aim at defining the quantification and inclusion of relevant knowledge to deep models to minimize the learning time and false alarm rate.
Graph Neural Network (GNN): Graph Neural Network is a type of neural network which directly operates on the graph structure (Scarselli et al. 2008). A typical application of GNN is node classification. Essentially, every node in the graph is associated with a label, and we want to predict the label of the nodes without ground-truth. In this process, the model generates importance score for each node and the connection weights form the weights of the relationship between the nodes. In this and a similar study (Wang et al. 2019), the GNN framework can be seen as leveraging the structural property of the KG and quantifying itself using the input data. However, the framework is restricted to the labels in the input dataset and their inter-relationships. Further, the GNN does not exploit the structural property and taxonomic relationships of the KG in identifying the relevant knowledge that can be applied to the learning of the neural network. Further, the hidden nodes in GNN are unaware abstractions corresponding to a stratified knowledge in a KG , thus the relationships between the labels are not well contextualized.
Tree LSTMs: LSTMs are sequential models, whereas the sentences in the input corpus follow a grammatical tree structure (dependency or constituency). Hence, it is important to learn the contextual representation of the input following the same tree structure. Tree LSTMs (Tai et al. 2015) replaces the nodes in the graph with LSTMs cells and vector representation of the words/phrases is given as input. This model considers structural (syntactic) property of the input, but the domain knowledge is ignored.
A recent study from (Yang et al. 2017) utilizes external knowledge bases (e.g. WordNet, NELL) to improve the performance of BiLSTMs by minimizing task-specific feature engineering. Particularly, the study focused on improving entity and event extraction. Knowledge-based LSTM proposed in the study comprises of an attention mechanism that acts as a sentinel to guide the model in deciding whether to use external knowledge and adaptively decide the level of abstractness in the information. Though the proposed architecture uses external knowledge base as a separate component for each LSTM cell, it is uncertain how much of the external knowledge needs to be incorporated and to what level of abstraction the traversing of the knowledge base needs to be done to fulfill the information loss in the learning process.
-----------------------------
Papers on DL techniques cited:
● (Word2Vec): Mikolov, T. el al. Distributed representations of words and phrases and their compositionality. NIPS 2013.
● (GloVe): Pennington, J., et al. Glove: Global vectors for word representation. In Proc. EMNLP 2014.
● (Retrofitting): Faruqui, M., et al. Retrofitting Word Vectors to Semantic Lexicons. In Proc. NAACL-HLT 2015.
● (Counter-fitting): Mrk?i?, N., et al. Counter-fitting Word Vectors to Linguistic Constraints. In Proc. NAACL-HLT 2016.
● (FastText): Bojanowski, P., et al. Enriching word vectors with subword information. TACL 2017.
● (ELMo): Peters, M. E., et al., Deep contextualized word representations. In Proc. NAACL-HLT 2018.
● (Transformers): Vaswani, A., et al. Attention is all you need. NIPS 2017.
● (BERT): Devlin, J., et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 2018.
● (GPT-2): Radford, A., et al., Language models are unsupervised multitask learners. OpenAI Blog 2019.
● (XLNet): Yang, Z., et al., XLNet: Generalized Autoregressive Pre-training for Language Understanding. arXiv preprint arXiv:1906.08237 2019.
● (RoBERTa): Liu, Y., et al.,Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 2019.
● (Teacher Forcing): Williams, Ronald J., et al. "A learning algorithm for continually running fully recurrent neural networks." Neural computation 1989.
● (Professor Forcing): Lamb, Alex M., et al. "Professor forcing: A new algorithm for training recurrent networks." NIPS 2016.
● (NAM): Rush, Alexander M., et al. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 2015.
● (Tree-LSTM): Tai, Kai Sheng et al. "Improved semantic representations from tree-structured long short-term memory networks." arXiv preprint arXiv:1503.00075 2015.
● (GNN): Scarselli, Franco, et al. "The graph neural network model." IEEE Transactions on Neural Networks 2008.
● (NAM) Yi, Kai, et al. "Knowledge-based Recurrent Attentive Neural Network for Small Object Detection." arXiv preprint arXiv:1803.05263 2018.
● (TransE): Wang, Zhen, et al. "Knowledge graph embedding by translating on hyperplanes." AAAI 2014.
● (KBLSTM): Yang, Bishan, et al. "Leveraging Knowledge Bases in LSTMs for Improving Machine Reading." In Proc. ACL 2017.
---------------------------------
Deep Infusion of Knowledge
We define the third category of knowledge infusion i.e. Deep Infusion of Knowledge as a paradigm that couples the latent representation learned by deep neural networks with KGs exploiting the semantic relationships between entities. We aim to: (1) quantify the information loss, (2) identify the relevant knowledge at an appropriate level of abstraction, (3) appropriately combine the representation of identified concepts in KGs with a latent representation of data. The existing research shows the contribution of incorporating external knowledge in machine learning, this incorporation mostly takes place before or after the actual learning process. We argue that deep infusion within the latent layers of neural networks will boost the performance of neural networks as an integral component of AI models deployed in applications. With a deep infusion of such structured knowledge, it will reveal patterns that are missed by shallow and semi-deep infusion because of sparse feature occurrence, ambiguity, and noise. This approach will allow accomplishing the infusion of declarative domain knowledge in latent layers of neural networks.
Among current state-of-the-art works, (Kai Yi et al. 2018) have introduced a knowledge-based recurrent attention neural network (KB-RANN) that modifies the attentional mechanism by incorporating domain knowledge to make the model generalize better. However, their domain-knowledge is statistically derivable from the existing data, without capturing exceptions, anomalies, and irregularities which are sparse but important knowledge that helps to characterize semantic cues and nuances. The studies for incorporating knowledge in a deep learning process have not involved structured knowledge in the form of KGs. On the other hand, (Arguello Casteleiro et al. 2018) recently showed how the Cardiovascular Disease Ontology provided context and reduced ambiguity, improving performance on a synonym detection task. Researchers employed embeddings of entities in a KG, derived through Bi-LSTMs, to enhance the efficacy of neural attention models. Looking ahead, given that KGs use a rich graphical representation, we believe that graphical neural networks will provide richer ways to align knowledge with the learning process and support infusion while maintaining the richness of knowledge representation such as link (relationship) semantics. These existing studies utilized external knowledge after the representation has been generated by neural language models, rather than within the deep neural network. We argue that a learning framework that incorporates domain knowledge within the latent layers of neural networks for modeling will improve performance in a holistic manner.
"It would be useful to use a stratified representation of knowledge representing different levels of abstractions. As we understand the level of abstraction represented by different layers in a deep learning model, we can look to transfer knowledge that aligns with the corresponding layer in the layered learning process. "
In healthcare, for example, infusing knowledge would mean incorporating rich domain knowledge captured in manually curated medical KGs (e.g., UMLS, ICD-10 and DataMed) while not losing all the abstractions and context (e.g., a term used in “family history” has a different meaning than the same term used in “impression and plan” in an EMR), taxonomic and named relationships, complex and compound entities (e.g., “adenomatous hyperplasia of endometrium” is a single entity, and any system that thinks this related to hyperplasia or endometrial would be using incorrect semantics). In DL for NLP, knowledge corresponding to linguistic aspects or components (words, entities and relationships, modifiers, phrases recognized by parsed trees, etc.) will be incorporated at different layers in the learning process. In a task like deep learning used for image processing, the knowledge for texture is best incorporated at an intermediate layer that corresponds to the abstraction of texture and best utilize it. As each layer in a neural network architecture produces a latent representation that is transmitted between hidden layers, the infusion of knowledge during this learning process raises the relevant research questions: (i) How to decide whether to infuse knowledge or not at a particular stage in learning between layers, and how to measure the incorporation of knowledge? (ii) How to merge latent representations with knowledge representations, and how to propagate the knowledge through the learned representation? While these research questions require further investigations, we believe that developing functions in a neural network architecture with respect to representations of external knowledge. As the goal is to infuse knowledge within the neural network, the architecture can be designed as follows: (i) before the output layer (see Figure 3), (ii) between hidden layers.
Figure 3: Representations of data are generated, and domain knowledge amplifies the significance of specific important concepts that are otherwise missed in the learning model. Classification error and KG determine the need for infusing knowledge. The Knowledge Infusion Layer incorporates the knowledge in the latent representation before the output layer.
While it is essential to have an appropriate design for neural network architecture, the creation of appropriate knowledge representation to be infused in to the neural networks is also crucial. As a representation of knowledge in the KG can be typically generated as embedding vectors, it still does not truly represent the semantics and requires further investigation to reflect the power of knowledge in a KG with its relationships (Kursuncu et al. 2019). Specific contextual models and/or more generic models can be utilized to create an embedding of each concept and their relations in a KG through the proximity using appropriate distance measures (e.g., Least Common Subsumer). Further, existing knowledge embedding models can be utilized such as TRANS-E, TRANS-H, and HOLE for the creation of embeddings from KGs.
As we argue that knowledge infusion can occur between hidden layers or just before the output layer. (Kursuncu et al. 2019) details an initial approach for the Knowledge Infusion Layer for the scenario which takes place just before the output layer. In neural language models, the output layer (e.g., SoftMax) estimates the error to be back-propagated. Each epoch generates an error which is incrementally reduced, and it is back-propagated until the model reaches a saddle point in the local minima. The error represents the difference between actual and predicted labels. Two specific functions were introduced as an initial approach to optimize the loss function with respect to the KL divergence and merge the latent vectors from the hidden layers and the knowledge embedding. This approach estimates the divergence between the latent representations and knowledge representation, to determine the differential knowledge to be infused. Further, modulation of the knowledge-infused learned weight matrix and latent representation will be critical and will need further investigations.
Discussion
In this article, we overviewed the continuing progress towards using and incorporating structured knowledge to develop increasingly more powerful learning techniques. Future advances in this area will integrate top-down and bottom-up processing, moving AI techniques closer to how cognitive scientists believe human brains function.
References
- Anantharam, Pramod, et al. "Extracting city traffic events from social streams." ACM Transactions on Intelligent Systems and Technology 2015.
- Baroni, Marco et al., "Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors." Proc 52nd Annual Meeting of the Association for Computational Linguistics 2014.
- Bian, Jiang et al.. "Knowledge-powered deep learning for word embedding." Joint European conference on machine learning and knowledge discovery in databases 2014.
- Domingos, Pedro M. "A few useful things to know about machine learning." Commun. ACM 2012.
- Gaur, Manas, et al. "Knowledge-aware assessment of severity of suicide risk for early intervention." The World Wide Web Conference. ACM 2019.
- Hu, Zhiting, et al. "Deep generative models with learnable knowledge constraints." Advances in Neural Information Processing Systems 2018.
- Kursuncu, Ugur et al. "Knowledge Infused Learning (K-IL): Towards Deep Incorporation of Knowledge in Deep Learning." AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering. Palo Alto, California, USA. 2020. arXiv:1912.00512.
- Kursuncu, Ugur, et al. "Modeling Islamist Extremist Communications on Social Media using Contextual Dimensions: Religion, Ideology, and Hate." Proc ACM on Human-Computer Interaction 2019.
- Ramakrishnan, Cartic, et al., "Joint extraction of compound entities and relationships from biomedical literature," 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 398--401.
- Sheth, Amit, et al. ,"Semantics for the semantic web: The Implicit, the Formal and the Powerful." International Journal on Semantic Web and Information Systems (IJSWIS) 2005.
- Sheth, Amit, et al. "Knowledge will propel machine understanding of content: extrapolating from current examples." Proc International Conference on Web Intelligence. 2017.
- Sun, Chen, et al. "Revisiting unreasonable effectiveness of data in deep learning era." Proc IEEE international conference on computer vision 2017.
- Vo, Khuong, et al. "Combination of domain knowledge and deep learning for sentiment analysis." International Workshop on Multi-disciplinary Trends in Artificial Intelligence 2017.
- Wang, Xiang, et al. "Explainable reasoning over knowledge graphs for recommendation." Proc AAAI Conference on Artificial Intelligence 2019.
- Xu, Jingyi, et al. "A Semantic Loss Function for Deep Learning with Symbolic Knowledge." International Conference on Machine Learning 2018.
Acknowledgments: We thank Swati Padhee, Dr. T.K. Prasad, and Dr. Biplav Srivastava for their reviews and comments.
The above is the preprint of the second article in my Knowledge Graph department in IEEE-IC. Here is the first article in the series: Knowledge Graphs and Knowledge Networks: The Story in Brief and a related keynote: Knowledge Graphs and their central role in big data processing: Past, Present, and Future
Two more papers in the trio on the topic of Knowledge-Infused Learning (K-IL):
- Ugur Kursuncu, Manas Gaur, Amit Sheth, Towards Deep Incorporation of Knowledge in Deep Learning, AAAI-MAKE, March 2019.
- Ruwan Wickramarachchi, Cory Henson, Amit Sheth, An Evaluation of Knowledge Graph Embeddings for Autonomous Driving Data: Experience and Practice, AAAI-MAKE, March 2019.
Also Related: Blending Deep Learning with Knowledge
Articles in the Knowledge Graph department:
- A. Sheth, S. Padhee and A. Gyrard, "Knowledge Graphs and Knowledge Networks: The Story in Brief ," in IEEE Internet Computing, vol. 23, no. 4, pp. 67-75, 1 July-Aug. 2019, doi: 10.1109/MIC.2019.2928449.
- A. Sheth, M. Gaur, U. Kursuncu, R. Wickramarachchi, Shades of Knowledge-Infused Learning for Enhancing Deep Learning, IEEE Internet Computing, 23 (6), Nov/Dec 2019, pp. 54-63. DOI: 10.1109/MIC.2019.2960071 [This article]
- S. Bhatt, A. Sheth, V. Shalin and J. Zhao, "Knowledge Graph Semantic Enhancement of Input Data for Improving AI" in IEEE Internet Computing, vol. 24, no. 02, pp. 66-72, 2020. doi: 10.1109/MIC.2020.2979620
- H. Purohit, V. L. Shalin and A. P. Sheth, "Knowledge Graphs to Empower Humanity-Inspired AI Systems," in IEEE Internet Computing, vol. 24, no. 4, pp. 48-54, 1 July-Aug. 2020, doi: 10.1109/MIC.2020.3013683.