登录查看更多内容

Learning Relations between Knowledge Graph Relations

Mike Dillinger, PhD

发布日期: 2023年11月3日

One key way that engineers build LLMs like chatGPT is with a process called transductive learning.? In this approach, we create a huge collection of documents – a "corpus" – then split it. About 80% we use to train the system and build language models. The other 20% we set aside:? we don't use it for training, only for testing. So we create a model based on the original corpus with a broad coverage vocabulary and test with a different sample from that same vocabulary – to ensure that the test vocabulary is comparable to the training vocabulary.?

But these models can't generalize or guess well on different or unseen vocabulary – in a sense, the models are too focused, too dependent on the specific sample of training documents. These models work well on new instances of familiar vocabulary but falter on new terms.? We try to make up for this weakness by using truly ginormous training corpora to sidestep the problem.??

This kind of model building, then, can be seen as a weak kind of transfer learning – it works because it finds patterns among familiar items and "transfers" those patterns to model different samples of the same corpus.? In this case, the patterns that we can transfer are the vocabulary items shared across training and test samples or across training and what users give us in production.?

But what can we do to address the unknown vocabulary, the mismatches between the training and test corpora? How can we get the model to generalize or guess better when faced with new terms???

A different, more recent, method – called inductive learning – addresses this problem directly, finding and transferring other patterns, not just shared vocabulary.? In the clearest test case, there may be no shared vocabulary at all – we can (and should) double check that none of the test items have appeared in training.

With this method, there are at least three other kinds of patterns (or invariant features) that we might work with when we model the original documents:

Sub-tokens, i.e., re-combinable pieces or substrings of target items, like suffixes, prefixes, byte pairs, etc.
Context tokens, i.e., re-combinable vocabulary around or in the context of the target item
Item features, i.e., re-combinable formal or semantic characteristics of the target item

In all of these cases, we learn patterns of sub-tokens, context tokens, and/or features instead of (or in addition to) patterns of vocabulary.? When something new appears, like a new word, it may match any of these other patterns even if it doesn't exactly match a vocabulary item. Rather than rely on a brittle exact match to a full vocabulary string – like we would with a database, for example –, we can rely on a robust, flexible fuzzy match to a collection of features – like we would with other machine learning techniques. In other words, we build a more robust model:

We make our internal representation of the original item more general and more flexible: instead of a specific string, we represent it as a collection of patterns.

Patterns with semantic item features are particularly important in the case of knowledge graphs:? the graph neighborhood of each item in a graph is a collection of rich conceptual or semantic features – i.e., all the other items it is explicitly related to and the type of relation for each. This is a kind of conceptual unpacking – we document the key components and characteristics of each concept.

领英推荐

You are a Hyper-Intelligent ChatGPT GPT-4 Prompt…

Sean Chatman 2 年前

6 Big Conversations about AI that L&D Should be Having…

Megan Torrance 4 个月前

The AI Hack That Boosted My Learning Speed by 10x…

Ritesh Kanjee 5 个月前

Conceptual unpacking is what gives knowledge graphs their superpowers.

Finding and leveraging patterns of this kind will allow us not only more robust and reliable models (through better inductive and transfer learning), but also allows us to approximate other forms of reasoning much more directly – because with knowledge graphs we make models for concepts, not only models of strings.?

This inductive learning approach is particularly important for matching and merging knowledge graphs.? It is common to have a model of the vocabulary (the node and relation labels) for one knowledge graph but not for another. This is because today's knowledge graphs and ontologies are often built and curated manually by domain experts, so the depth of the experts' training, along with a lack of effective tooling, forces them to create many, smaller graphs, each focused on a very specific domain. Moreover, different perspectives and different priorities lead to knowledge graphs with overlapping items but differing triples. But then how can we combine them?? Inductive learning can help to merge and cross-validate different knowledge graphs, ontologies, or taxonomies, even when their nodes and relations have different labels.

One excellent example of inductive learning over knowledge graphs that have different vocabularies is the very recent paper by Mikhail Galkin et al:? Towards Foundation Models for Knowledge Graph Reasoning. In this paper, the authors focus on knowledge graph relations (not the nodes) to illustrate the usefulness of this approach.? They model patterns in the context of target relations (which relations co-occur with them) – a graph of relations – to establish similarities between relations from different sources. Their features focus on context tokens:? same-head relations (when the head of relation1 matches the head of relation2), same-tail relations, head-matches-tail, and tail-matches-head relations. And they do this without modeling the head or tail entities at all.

To be clear, the "knowledge graph reasoning" that they focus on is simply "reasoning" about the similarity of relations. Their results document systematically that leveraging the patterns in this relation similarity graph indeed enables much better transfer and generalization across differing relation vocabularies. In this way they can find which known relation is closest to a new, unknown relation.

This work provides plenty of food for thought.

It reminds us that how you train and test is very important. Even though it's widely used, the transductive approach is really too weak for robust applications and should not be used in industry. In production data, there are too many surprises.

These authors look at patterns of unanalyzed context tokens and even this has a positive impact. So we can expect that richer, more informative semantic item features (like knowledge graph neighborhoods of these tokens) will allow us to extend this work to more complex forms of knowledge graph reasoning.

This interesting study is focused on what seems to be a very small, very technical problem. But it suggests a solution for a much broader one.? Shared standards are notoriously difficult to develop, to agree on, and to comply with -- just ask anyone who's been on a standards committee.? The many ill-fated proposals for a standard vocabulary of knowledge graph relations (see one review in Helbig, 2006) are just one more example of this well-known difficulty.? To the extent that we can relate differing knowledge graph relations reliably and automatically using an approach like the one described here, then agreeing on a set of must-match-exactly relations is no longer necessary.? Instead of exact-match, systematic standardization for knowledge graph relations, maybe we can work with fuzzy matches and avoid the difficulties of standardization altogether.??

Mark Spivey

Helping us all "Figure It Out" (Explore, Describe, Explain), many Differentiations + Integrations at any time .

1 年

“meaning is usage, structure emerges from usage” … and “usage” is individual and independent … many people mistakenly reify and assume and expect semantics as being formal and expressed and relatable etc . you can’t point to “semantics” . people are always merely forever pointing to usages of signs and symbols by specific people at specific times for specific reasons .

Mike Dillinger, PhD

1 年

The authors published a blog post about this interesting study here today: https://towardsdatascience.com/ultra-foundation-models-for-knowledge-graph-reasoning-9f8f4a0d7f09

1 次回应

Paul Sumares

Senior Software Engineer

1 年

Mike, if I recall correctly, you were until recently Open To Work ... did some smart company snap you up?

Rémy Fannader

Author of 'Enterprise Architecture Fundamentals', Founder & Owner of Caminao

1 年

Modalities are key to knowledge, but LLMs cannot deal with them. https://caminao.blog/knowledge-management-booklet/a-knowledge-engineering-framework/a-primer-on-knowledge-gardening/

1 次回应

Caio Calado

Conversation Designer and Community Manager

1 年

Fabiola Aparecida Vizentim Duda Bona

2 次回应

查看更多评论

要查看或添加评论，请登录

Mike Dillinger, PhD的更多文章

Knowledge Graphs: Artificial Knowledge for Artificial Intelligence

2025年3月14日

Knowledge Graphs: Artificial Knowledge for Artificial Intelligence

Intelligence is simply being good at thinking: at using what you know to make sense of what you don't. That might be…

59 条评论
Herding, Culling, and Caging Predicates for Knowledge Graph Relations

2025年2月7日

Herding, Culling, and Caging Predicates for Knowledge Graph Relations

Lists and bags and sets are jumbles of items that I find aberrant and abhorrent. So when I see people blithely invent…

14 条评论
Diversity, Depth, and Density of Knowledge Graph Relations

2025年1月13日

Diversity, Depth, and Density of Knowledge Graph Relations

At the top of my list of New Year’s resolutions this year is relation resolution. In the world of structured knowledge,…

15 条评论
New Year's Resolutions for your Knowledge Graphs

2024年12月23日

New Year's Resolutions for your Knowledge Graphs

As you enjoy your holiday season, I suggest two resolutions to consider for the New Year: entity resolution and…

18 条评论
Knowledge Graphs and Monkey Business with Generative AI

2024年12月9日

Knowledge Graphs and Monkey Business with Generative AI

Throughout the year I got poked and prodded and challenged in a bunch of different ways by my friends, colleagues, and…

7 条评论
Thanks for them Knowledge Graphs

2024年11月28日

Thanks for them Knowledge Graphs

It's Thanksgiving Day here in the US. A time to count one's blessings.

10 条评论
Knowledge Graphs are Essential for Safe AI

2024年11月11日

Knowledge Graphs are Essential for Safe AI

AIs will only be safe for general use when they have and use goals and values that are identical to those of humans. In…

30 条评论
Knowledge graphs, Linguists, and the Last-mile problem of AI

2024年11月4日

Knowledge graphs, Linguists, and the Last-mile problem of AI

Now that AI can generate fluent text at scale in multiple languages and different styles, are authors, translators…

22 条评论
Audio: How to make AI safe and reliable?

2024年10月21日

Audio: How to make AI safe and reliable?

Janie and Johnny are back for Episode 2 of my Byte-sized AI series! Listen in to these engaging, bite-sized podcasts to…
Audio: What are Knowledge Graphs?

2024年10月1日

Audio: What are Knowledge Graphs?

Who knew? It seems that Max Headroom had blue-eyed twins and they're all grown up! I suspect that he sent them to…

10 条评论

See all articles

Learning Relations between Knowledge Graph Relations

Mike Dillinger, PhD

领英推荐

Mike Dillinger, PhD的更多文章

社区洞察

其他会员也浏览了

Why AI Chatbot will never replace authentic learning ....

The Transformative Impact of Generative AI in Learning and Development

Generative AI and the Potential for (Anti) Social Learning

Accelerating Learning and Development with AI: A Game-Changer for Organizations

The AI Fusionist Playbook for Learning Theories

Reinforcement Learning from Human Feedback: A Simple Guide for Everyone

Bad Students Make Great Teachers: Active Learning Accelerates Large Scale Visual Understanding

150+ GenAI courses on LinkedIn Learning

Exploring the Potential of ChatGPT in L&D: Does Generative AI have the Potential to Shake up Corporate Learning?

How Generative AI Affects How We Learn, Train and Develop in 5 Bullet Points

领英推荐

Mike Dillinger, PhD的更多文章

Knowledge Graphs: Artificial Knowledge for Artificial Intelligence

Herding, Culling, and Caging Predicates for Knowledge Graph Relations

Diversity, Depth, and Density of Knowledge Graph Relations

New Year's Resolutions for your Knowledge Graphs

Knowledge Graphs and Monkey Business with Generative AI

Thanks for them Knowledge Graphs

Knowledge Graphs are Essential for Safe AI

Knowledge graphs, Linguists, and the Last-mile problem of AI

Audio: How to make AI safe and reliable?

Audio: What are Knowledge Graphs?

社区洞察

其他会员也浏览了

Why AI Chatbot will never replace authentic learning ....

The Transformative Impact of Generative AI in Learning and Development

Generative AI and the Potential for (Anti) Social Learning

Accelerating Learning and Development with AI: A Game-Changer for Organizations

The AI Fusionist Playbook for Learning Theories

Reinforcement Learning from Human Feedback: A Simple Guide for Everyone

Bad Students Make Great Teachers: Active Learning Accelerates Large Scale Visual Understanding

150+ GenAI courses on LinkedIn Learning

Exploring the Potential of ChatGPT in L&D: Does Generative AI have the Potential to Shake up Corporate Learning?

How Generative AI Affects How We Learn, Train and Develop in 5 Bullet Points