Herding, Culling, and Caging Predicates for Knowledge Graph Relations
Lists and bags and sets are jumbles of items that I find aberrant and abhorrent. So when I see people blithely invent and bandy about this knowledge graph predicate and that ontological predicate, I instinctively want to herd the predicates together, cage them in groups, cull the weakest among them, and try to tame the best so they make some sort of sense -– and eventually do my bidding. Words have work to do and predicates are the strongest among them.
This urge to organize and quest for coherence is nothing new. Centuries of experience show very clearly that the key to progress in understanding is to continuously and recursively “unpack” the things we study and not treat them only as unanalyzed wholes. It's already perfectly clear that when we want to compare, group, or organize things of any kind, we need to identify and leverage their features, their attributes, their relations with others. Analysis (or the more trendy “featurization”) is essential.?
Let's look at knowledge graph predicates and relations in this light.
More often than not we rely on strings – database column labels, sentences, keywords, definitions, etc. – as a starting point for creating structured knowledge. Some of those strings label entities; others represent relations between those entities. For today, I want to call “predicates” those strings (like verbs, adjectives, conjunctions) that represent some kind of conceptual relation or attribute. They are particularly important because we use predicates to describe and define entities.
Predicates are the language of description, labels on the bedrock of knowledge, pointers to how to ground our concepts in perception.
But like other strings, predicates are inherently vague and ambiguous: labels that are often clear to their creators but not to their consumers.??
We label things based on their features. We group things based on their features. We organize things based on their features. And though we're more used to analyzing and featurizing concrete physical things, abstract conceptual relations (and the predicates that convey them) are no different.? However unfashionable it might be at the moment, doing our own analysis – featurizing the items that we want to understand and model – is crucial for deep understanding, for robust matching, and for reliable inference.?
When we analyze and featurize entities like concrete physical things, we use several kinds of features (which are not mutually exclusive):
We can do the same thing to analyze and featurize knowledge graph relations.
Once our relations are featurized more explicitly, we can use these definitions to group and organize the predicates that convey them. In so doing, we adopt a more reliable, systematic approach to relation resolution – just as biologists have done for centuries to “resolve” the wildly variable names that people give to living beings.?
If for simplicity we focus initially on the components of conceptual relations, we can attend to the signature of each relation (the types of nodes or entities it connects by definition) and identify an initial array of fairly common families of relations:
Why bother?
When we extract relations from text, based on very variable predicates, we can use the families above as initial criteria for identifying candidates and for validating the relations we extract.? We will expect predicates in different families to be less likely candidates for a particular relation. And we expect the signature of the predicate to match closely the signature of the best relation it resolves to. In addition, the more features of relations (like their signature) that we can reliably identify and document, the richer and more robust the kinds of reasoning we can simulate in algorithms.?
Some technologists see little use in distinguishing between strings like verbs or adjectives (predicates) and abstract conceptual relations so they use many different predicates to represent the same relation. But multiple representations for the same relation defeats the purpose of representation in the first place – it simply creates more technical debt. And use cases that require better accuracy and higher reliability demand a way to decide which predicates are similar or the same – they can't avoid relation resolution.? Defining relation types as above helps to guide and validate this process.
Different kinds of structured knowledge (taxonomies, ontologies, knowledge graphs, etc.) include some but often not all of these relation types – they vary in expressivity. So the relation types above can help us choose which techniques to use for each use case.
AI Safety requires us to convey principles and guidelines to algorithms – and to verify how they have been “understood”.? Different kinds of structured knowledge include more of these families of relations so they allow us to express and verify a wider range of concepts.?
This is only the beginning. As we document more deeply and more precisely the conceptual relations that we need to represent, we can make simulated reasoning more scalable, more effective, and more impactful.??
Reliable Artificial Intelligence requires reliable Artificial Knowledge.
It has to be built on a foundation of clearly featurized and structured conceptual relations.?
Semantic AI @AICYC | Executive Chairman @ IKNOWit.WORLD | CEO at INTELLISOPHIC.
1 个月Hi Mike, The relationship categories are critical to unification of triples in reasoning logic. Great to see you are on it. A piece of cake (all are the same), an engine part (most are different), paint on a surface (differ in color) illustrate the subtle issues you raise. The purpose is to accuratly automate reasoning systems. Thanks for like. https://aicyc.org/2024/10/05/how-sam-thinks/ TL;DR: Semantic AI Models (SAM) and Large Language Models (LLM) work together to create a distributed inference system that mimics human cognition. SAM extracts facts and concepts from text to build a knowledge graph. It then directs the LLM to represent this knowledge in second-order logic (SOL) expressions. These SOL expressions can be computed efficiently using fuzzy logic operations like MIN and MAX in a cloud environment like AWS S3 with Hadoop MapReduce. This allows the knowledge graph to reason and infer new knowledge similar to how humans think, bringing us closer to artificial general intelligence (AGI). The approach is contrasted with neuro-symbolic AI, which is less transparent and harder to guarantee correctness compared to the explicit logic used by SAM and LLMs.
Founder @ The Cyber Boardroom, Chief Scientist @ Glasswall, vCISO, vCTO and GenAI expert
1 个月Hi, great article, I really like the richness of those predicates, and I completely agree that they are critical What I found is that : a) it is very important to have the reverse path for each predicate ('is parent of' and 'is child of') b) this needs to be created bottom up (organically by the consumers of the graphs), not top down c) it's ok to have lots of redundancy and very similar predicates (reflecting the reality that different teams, environments, cultures, areas of expertise and roles have different names or verbs (aka predicates) for the same thing d) given good feedback back loops and REPLs for those graph curators, this will actually create really good, natural and easy to understand bottom-up ontologies and predicate lists (better than anything that would be created top-down)
Technical Content Developer at US Pharmacopeia
1 个月So much relies on untrendy concepts like "standardization" and "requirements". A knowledge graph isn't a work of art, it's a tool serving a purpose. Fail to observe that now, and you'll soon find yourself in a working group developing a "thesaurus of predicates".
Disambiguation Specialist
1 个月Mike Dillinger, PhD - "Predicates are the strings that represent conceptual relations between entities. But (un)like other strings, predicates are inherently vague and ambiguous.?So we need to featurize carefully what we think those predicates mean..." Yep!
Head of Data Science, AI & Ethics lead
1 个月Aberrant and abhorrent! :D Dropping KG hot takes and taking names Mike Dillinger, PhD