登录查看更多内容

Hypergraphs and RDF

Kurt Cagle

Editor In Chief @ The Cagle Report | AI, Data Modeling

发布日期: 2024年1月25日

The topic of hypergraphs has come up of late in a couple of different discussion groups, and it seems like a good opportunity to explore what they are, and how they fit (or don’t) in the broader RDF ecosystem.

Evolving to Hypergraphs

In an ordinary graph, you have nodes and edges, with each node connected to another node along an edge.

In Turtle, this is expressed in about as simple a manner as you can imagine:

:s :p :o .

where s represents a subject node, p a predicate edge, and o, an object node.

Let’s say that you have three triples where you have the same subject and predicate but different objects:

It is also possible to have multiple object nodes associated with the same subject and predicate:

:s :p :o1 .
:s :p :o2 .
:s :p :o3 .

However, it’s worth noting here that while RDF here treats them as separate assertions, Turtle does not.

This is an interesting observation, because it suggests that Turtle is more than just a syntactical sugar - it seems to suggest that RDF has the ability to have a set of objects associated with a triple. At least in the direction from subject to object, Turtle is describing what’s known as a hypergraph.

Loosely, a hypergraph can be thought of as a graph in which a single node can, through its edge, be associated with a set of nodes. In the case of Turtle, this expression is indicated through the simple sequence:

:s :p :o1, :o2 :o3 .

Now, I want you to focus on that small square in the middle of the diagram. This is the bridge that connects the world of Turtle (and hypergraphs) with the world of rdf. What this is actually saying is that there is construct that consists of a simple bag of nodes, and what is actually being pointed to is this construct. Put another way, the above statement could decompose as follows in Turtle:

:s :p _:b1 .
_:b1 rdfs:member o1 .
_:b1 rdfs:member o2 .
_:b1 rdfs:member o3 .

So arguably, RDF isn’t really a hypergraph … it’s a trick! Well, perhaps.

A set is a thing, consisting of a collection of other things. In other words, if you look upon the blank node discussed here as an object, then what this notation actually says is that _:b1 is a set, whose members are o1, o2, and o3. This is pretty much the definition of a hypergraph.

Now, this covers the object direction, but the subject direction is not quite as forgiving. For instance, suppose that I wanted to have multiple subjects sharing the same object, as in:

Once again, you are passing a set as an argument. The turtle here is roughly symmetric to the object case:

:s1 rdfs:member _:b2 .
:s2 rdfs:member _:b2 .
:s3 rdfs:member _:b2 .
_:b2 :p :o .

Now, it would be great if you could just say:

:s1, :s2, :s3 :p :o .  # Note: DOES NOT WORK

however, you can’t, primarily because this pattern wasn’t included in the Turtle parser (it likely breaks other things in Turtle as well). There are also no specialized markers for delineating a set in Turtle. Thus, at first glance, the idea of creating a true hypergraph goes out the window.

However, not all hope is lost. There is another structure within Turtle that can be delineated: the RDF List. RDF Lists are pretty ugly looking things:

This is about the simplest implementation of a linked list that you can create in a graph. However, it’s complicated enough that Turtle does have a notation for it, of the form:

( :item1 :item2  :item3 )

This notation has two advantages. First, it preserves the order of the list, something that a set doesn’t generally do. Additionally, the first blank node in the linked list (_:b1 here) serves as a pointer to the list. This means that by passing _:b1 you are passing what amounts to a list or array of items by reference.

This has very intriguing implications. Among other things, it means that you can pass such an array as the subject, meaning that once again Turtle is able to describe a hypergraph, this time on the subject node:

( :s1 :s2 :s3) :p :o .

You could pass it on the object node in the same way, of course:

:s :p ( :o1 :o2 :o3 ) .

You could even do it with both:

( :s1 :s2 :s3 ) :p ( :o1 :o2 :o3 ) .

This means that the “conceptual graph” that Turtle represents is a hypergraph, even if the underlying RDF implementation is not.

So what use is this? At it’s simplest, the fact that you can pass lists means that you can process lists in an explicit order. Let’s say you have the following Turtle code:

@prefix Book: <uri:urn:Book#> .
@prefix Chapter: <uri:urn:Chapter#> .
@prefix IPWork: <uri:urn:IPWork#> .

Chapter:Conclusion a Chapter: ;
       IPWork:hasTitle "Conclusion"^^xsd:string;
 .
Chapter:WhatIsOntology a Chapter: ;
       IPWork:hasTitle "What is Ontology?"^^xsd:string ;
.
Chapter:Introduction  a Chapter: ;
       IPWork:hasTitle "Introduction?"^^xsd:string ;
.
Chapter:TurtleAndTuppence a Chapter: ;
       IPWork:hasTitle "Turtle and Tuppence?"^^xsd:string ;
.
Book:OntologyALaymansGuide a Book: ;
      Book:hasChapters (
          Chapter:Introduction
          Chapter:WhatIsOntology     
          Chapter:TurtleAndTuppence
          Chapter:Conclusion
          ) ;
      IPWork:hasTitle "Ontology: A Layman's Guide"^^xsd:string ;
.

A SPARQL query can then use this information to create a listing of the chapters of the book in the correct order:

prefix Book: <uri:urn:Book#> 
prefix Chapter: <uri:urn:Chapter#> 
prefix IPWork: <uri:urn:IPWork#> 
prefix rdf: <https://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?bookTitle ?chapterTitle WHERE {
     ?book IPWork:hasTitle ?bookTitle .
     ?book Book:hasChapters ?chapters .
     ?chapters rdf:rest*/rdf:first ?chapter .
     ?chapter IPWork:hasTitle ?chapterTitle .
}

This will generate a table with the chapters in the specified order.

The expression


     ?chapters rdf:rest*/rdf:first ?chapter .

illustrates walking a list, with the rdf:rest using transitive closure (walking along the same predicates in a path until the subject node no longer has the rdf:rest property), extracting for that node the relevant first item or ignoring it, if it’s reached rdf:nil.

The upshot of this is that while RDF by itself is not a hypergraph, the abstractions brought about with Turtle can make it a hypergraph easily enough.

Object Hypergraphs

In a relational database, a given table can have only one foreign key to another table per column. You can think of each foreign key per column as being an outbound link. for the concept embodied by the resource identified by the row.

In one particular project I was involved in, countries could be qualified by country groups that described various characteristics of the countries. A simple example of this can be seen in the diagram below:

Country groups and associations, a case for hypergraphs?

Arpit Bhayani 2 年前

Regex.EnumerateSplits: .NET 9's New Powerful Tool for…

David Shergilashvili 1 个月前

What Does Big O(N^2) Complexity Mean?

Tpoint Tech 2 个月前

This corresponds to a Turtle file that looks (in part) something like this:

Country:Switzerland Country:inCountryGroup    
      CountryGroup:European,CountryGroup:Landlocked.
Country:Japan Country:inCountryGroup 
      CountryGroup:Asian,CountryGroup:Landlocked.

This is a hypergraph on the object side, because each country has more than one foreign key pointer. This is also a common design pattern where you have multiple tags that can be associated with a given concept. It should be noted, though, that if these tags are enumerations, such as the ones given above, there may, in fact, be some kind of underlying structure that is hidden in this model. For instance, the same information may also be stated as follows:

This, in turn, changes the Turtle, though only by normalizing two subclasses and creating two sub-properties:

Country:Switzerland
    Country:inContinent Continent:European;
    Country:hasCoastline Coastline:Landlocked.
Country:Japan
    Country:inContinent Continent:Asia;
    Country:hasCoastline Coastline:Island.

Country:inContinent rdfs:subPropertyOf Country:inCountryGroup.
Country:hasCoastline rdfs:subPropertyOf Country:inCountryGroup.

Put another way, you can achieve many of the same effects that hypergraphs give you by combining good design and working with reference pointers, as the book and chapter’s example in the previous section illustrates. Once you understand this point, hypergraphs aren’t all that scary, though they can help you push the boundaries of what you’re attempting to model.

I encountered this particular problem once when attempting to build a model for relating cars being sold to custom features for that car. This list could be quite extensive, often with dozens of such features for a given car, and my first inclination - to simply create a single list of customizations, very quickly became unworkable from a UI standpoint (replace inCountryGroup above with hasCustomization, and you get an idea of what this looked like). However, by using cascading categories (engineType, seatCoverType, etc.), I replaced a single large block with a number of properties. By creating a pointer to a list of such properties, I could then put all the customizations on a separate page.

There are two different ways to model this, as shown below:

The bag approach is shown on the left and can be considered a bag (or set) of lists. In this case, the square-edged boxes represent blank nodes, while the rounded-edged boxes are IRI nodes. The list approach, on the right, is typically of a hypergraph, in which you have a list of (here, category) nodes without an attending structure. Which is preferable? My own experience has been the bag approach, largely because there is less in the way of computation needed to determine grouping structures.

You may also be questioning what happens when you have mutually exclusive options. The engine type is an example of this. You could have an electric car, an internal combustion engine (ICE), a rotary engine, a fuel cell, or a hybrid engine, but you usually don’t have more than one (I consider a hybrid to be a different engine type altogether, as it represents a distinct architecture even though it has aspects of both electric and ICE).

This is what makes lists so tricky. A list can be considered a pseudo-element, requiring a certain amount of additional processing and conditional logic when querying, but can be thought of as being of the same type as its constituent items. what changes is the shape of the connection? In SHACL, this would look something like:

PartType:hasEngineType a sh:PropertyShape;
    rdfs:subPropertyOf Vehicle:hasPartType ;
    sh:name "Engine Type"^^xsd:string;
    sh:path PartType:hasEngineType ;
    sh:class EngineType:;
    sh:nodeKind sh:IRI ;
    sh:minCount 1;
    sh:maxCount 1;
    .
  
PartType:hasSeatCover a sh:PropertyShape;
    rdfs:subPropertyOf Vehicle:hasPartTypes ;
    sh:name "Seat Cover"^^xsd:string;
    sh:path [rdf:rest*/rdf:first] ;
    sh:class SeatCover:;
    sh:nodeKind sh:IRI;
    sh:minCount 0;
    .

In this case, while the assumption in both cases is that the object type of the predicate is some form of PartType:, itself a subclass of Category: . The paths to get to that class are different. This suggests (correctly) that the generic (abstract) Vehicle:hasPartTypes property has neither path nor cardinality:

Vehicle:hasPartTypes a sh:PropertyShape;
    sh:name "PartTypes"^^xsd:string;
    sh:class PartType:;
    sh:nodeKind sh:IRI;
    .

So what does this have to do with hypergraphs? Primarily hypergraphs are the association of data structures with relationships. Let’s take the bag approach as an example:

Vehicle:MyCar Vehicle:hasPartTypes [
     PartType:hasEngineType EngineType:Electric;
     PartType:hasSeatCover (SeatCover:Leather);
     PartType:hasAudioProtocol (AudioProtocol:Blutooth AudioProtocol:USB)
     ];
.

Named Graphs and Hypergraphs

Okay, you might be saying, “that makes sense for the object side of the equation, but what about the subject side?” (Of course, you may also be saying, "Gee, this article is too long; time for lunch”. It is Always hard to tell when you’re on this side of the blog.)

We use subject hypergraphs all the time. A graph is a set of assertions. You can refer to a set by name (the graph name), but what you’re actually talking about is the set of assertions:

This way, the part types are contained in a graph Graph:MyCar_PartTypes that uses the Category:isA property to identify the respective resources as being of the PartType: class.

Vehicle:MyCar Vehicle:hasPartTypes Graph:MyCar_PartTypes.

graph Graph:MyCar_PartTypes {
     EngineType:Electric Category:isA PartType: .
     SeatCoverType:Leather Category:isA PartType: .
     AudioProtocol:Blutooth Category:isA PartType: .
     AudioProtocol:USB Category:isA PartType: .
     }

Graph:MyCar_PartTypes
       Concept:createdBy Person:JaneDoe ;
       Concept:creationDate "2024-01-24"^xsd:date ;
 .

Here, Graph:MyCarPartTypes is a pointer to a set of triples. Note that this graph is referenced as both an object:

Vehicle:MyCar Vehicle:hasPartTypes Graph:MyCar_PartTypes.

and as a subject:

Graph:MyCar_PartTypes
       Concept:createdBy Person:JaneDoe ;
       Concept:creationDate "2024-01-24"^xsd:date ;
 .

The following indicates how you could then get the relevant part types from the graph in SPARQL, along with the author and creation date:

SELECT ?partType ?author ?creationDate WHERE {
    values ?vehicle {Vehicle:MyCar}
    ?vehicle Vehicle:hasPartTypes ?partTypes .
    ?partTypes {?partType Category:isA PartType: } .
    ?partTypes Concept:createdBy ?author .
    ?partTypes Concept:createdBy ?creationDate .
    }

For the above data, this produces the table:

One additional point to consider. You may likely run into a situation where all possible categories and their definitions are kept in a graph called something like Graph:Categories . Thus, if the entry for EngineType:Electric includes a property called PartType:hasPTIN (Part Type InventoryNumber), like as follows:

Graph:Categories {
    EngineType:Electric
        Category:isA PartType:EngineType ;
        PartType:hasPTIN "ENG-ELECTRIC"^^xsd:string ;
    .
    EngineType:ICE
        Category:isA PartType:EngineType ;
        PartType:hasPTIN "ENG-ICE"^^xsd:string ;
    .
    EngineType:Rotary
        Category:isA PartType:EngineType ;
        PartType:hasPTIN "ENG-ROTARY"^^xsd:string ;
    .
    AudioProtocol:USB
        Category:isA PartType:AudioProtocol ;
        PartType:hasPTIN "ENG-ROTARY"^^xsd:string ;
    # ...
}

Then the SPARQL query to get the PTIN for that particular part type would look like the following:

SELECT ?partType ?ptin WHERE {
    values ?vehicle {Vehicle:MyCar}
    ?vehicle Vehicle:hasPartTypes ?partTypes .
    ?partTypes {?partType Category:isA PartType: } .
    Graph:Categories {?partType PartType:hasPTIN ?ptin}
    }

In other words, the graph Graph:MyCar_PartTypes holds the part types that the individual vehicle “MyCar” has as declarations, but the definition for those part types (and all of the others) is kept in Graph:Categories. This produces the table:

Turtle by itself does not have named graphs, however TRIG (Turtle RDF Graph Language) does. With the design patterns given above, TRIG crosses the threshold to being a hypergraph, because a graph is by definition a set (in this case a set of triples). What’s even more significant here is that while there is no easy way to identify an RDF List except via blank nodes (which are by definition not referenceable), named graphs are, perforce, named.

Getting Hyper

TRIG, Sparql 1.1 Query, and SPARQL 1.1 Update, which all hit the stage around the end of 2013, changed the nature of the game concerning named graphs, and hence, hypergraphs. Until then, everything within a triple store was effectively one giant graph. Even after these specs were published, it took a few years for named graphs to make their way into most triple stores, and many older ontologists were simply not trained on them and tended to ignore their implications. OWL, for instance, does not recognize named graphs as a formal concept, although there are OWL implementations that have been extended to do so.

SHACL code, the Shape Constraint Language encoded as triples, is typically stored in a named graph in the system, and one can pass the named graph URI as an argument to any processing code that needs to validate against that SHACL. Indeed, a typical SHACL validation process involves two graphs - the source graph that contains the nodes to be validated and the SHACL graphs that perform the validation.

Indeed, in theory, it should be possible to run a SPARQL UPDATE script that will identify, validate, and, if errors occur, generate reports on those nodes that do not satisfy the SPARQL constraints. In practice, that’s considerably more complicated to write such a script by hand. However, it should be possible to write an update script that, when given a SHACL graph, will generate a SPARQL update file that can, in turn, perform the requisite validation and reporting, including inline SPARQL.

Since SHACL also incorporates the ability to create constraint logic and define functions, this also means that SPARQL UPDATE and SHACL together, as hypergraphs, can generate self-modifying functional code. This is not the place to discuss the how of this, but it may be the subject of a subsequent post.

This is a form of generative hypergraph that can use graphs as objects, as endpoints for processes, and as tools for generating other content. As with most generative technologies, it may likely take advantage not only of tools like SPARQL and SPARQL UPDATE but also of AI-generated graphs and scripts, most likely instantiated through a SERVICE interface.

Additionally, SHACL can be used by reference to get relevant properties like OWL that don’t need complex pre-processing. This, too, is fodder for another post.

We’ve covered a lot of ground here. RDF Lists and bags provide a rudimentary hypergraph capability, though using named graphs expands that capability tremendously. This also does not consider perhaps the biggest aspect of hypergraphs - using a named graph or pointer to a structure as a predicate. That’s where things get very interesting indeed.

In media res,

Kurt Cagle

Kurt Cagle is the Editor of The Ontologist and is a practising ontologist, solutions architect, and AI aficionado. He lives in Bellevue, Washington, with his wife and cats.

The Cagle Report

9,360 位关注者

Sourav Tripathy

3 个月

may be query performance might be impacted by the additional level of indirection (hyper edge nodes) compared to direct relationships in graphs built using neo4j if this concept is applied...Will it be more impactful than pairwise relationships ?

Nicholas Clarke

Visionary technologist and lateral thinker driving market value in regulated, complex ecosystems. Open to leadership roles.

10 个月

Mariam Brian check this POV.

1 次回应

Debbie Reynolds

10 个月

Looking forward to reading more about it in The Cagle Report. Keep up the great work!

John O'Gorman

Disambiguation Specialist

10 个月

Kurt Cagle - My understanding of hypergraphs differs quite a bit from yours, I think. Sets seem to me to be a special case where one concept, say 'colour' has several items that qualify as members of that set. Not sure I understand how a set becomes a hypergraph. To me, hypergraphs are a way to model persistent and therefore reusable patterns of association (OOP called them Classes), the instances / objects of which have their own identity. My original comment suggests that hypergraphs are a way to marry principles of object orientation with knowledge graphs. See graphic. Happy to be course corrected!

Roy Roebuck

Holistic Management Analysis and Knowledge Representation (Ontology, Taxonomy, Knowledge Graph, Thesaurus/Translator) for Enterprise Architecture, Business Architecture, Zero Trust, Supply Chain, and ML/AI foundation.

10 个月

So, next - hyperedges!

1 次回应

查看更多评论

要查看或添加评论，请登录

Kurt Cagle的更多文章

A Brief History of Writing

2024年11月16日

A Brief History of Writing

Copyright 2024 Kurt Cagle / The Cagle Report This is a preview draft of a chapter of my upcoming book Ontology: A…

28 条评论
Changing Knowledge Graph Properties and Other RDF Thoughts

2024年10月26日

Changing Knowledge Graph Properties and Other RDF Thoughts

Copyright 2024 Kurt Cagle / The Cagle Report The problem with writing a blog is that eventually, you will reach a stage…

15 条评论
Where's the Deep Fake Crisis?

2024年10月8日

Where's the Deep Fake Crisis?

Copyright 2024 Kurt Cagle / The Cagle Report This piece, written by Harry Law and Michael Spencer makes an observation…

25 条评论
An Imperfect Canvas ... But An Interesting First Step

2024年10月6日

An Imperfect Canvas ... But An Interesting First Step

Recently, OpenAI announced the beta test for the new Canvas feature in ChatGPT, released in some GPT-4o versions…

14 条评论
Why Your Workers Aren't Going to Come Back to the Office ... Ever

2024年10月4日

Why Your Workers Aren't Going to Come Back to the Office ... Ever

CEOs Say Hybrid Work Era Will End Why Do People GO To Work? In the waning days of 2021, as we began to see signs that…

19 条评论
Reification and Why It REALLY Matters

2024年9月21日

Reification and Why It REALLY Matters

Copyright 2024 Kurt Cagle / The Cagle Report I've mentioned reification a number of times in my recent writing, and…

7 条评论
What is a Confabulator?

2024年9月10日

What is a Confabulator?

Copyright 2024 Kurt Cagle / The Cagle Report I've made this observation many times in LinkedIn feed posts, that an LLM…

14 条评论
Some Linked-In Article Authoring Tips

2024年9月8日

Some Linked-In Article Authoring Tips

Copyright 2024 Kurt Cagle / The Cagle Report I'm coming up on 100 articles in this newsletter (96 and counting) and…

1 条评论
Knowledge Graphs and Supply Chains

2024年9月1日

Knowledge Graphs and Supply Chains

Copyright 2024 Kurt Cagle / The Cagle Report From one of my readers: Hi Kurt, have you written any article on how a…

23 条评论
A Data Vector Primer for Ontologists

2024年9月1日

A Data Vector Primer for Ontologists

Copyright 2024 Kurt Cagle / The Cagle Report Of late, I've been spending a fair amount of time in vector store land…

7 条评论

See all articles

Hypergraphs and RDF

Kurt Cagle

Editor In Chief @ The Cagle Report | AI, Data Modeling

Evolving to Hypergraphs

Object Hypergraphs

领英推荐

Named Graphs and Hypergraphs

Getting Hyper

The Cagle Report

9,360 位关注者

Kurt Cagle的更多文章

社区洞察

其他会员也浏览了

Projections with Ranges

Algorithms — Big O Notation

Understanding Big O Notation, Time and Space Complexity

Cell Plots

Binary Trees study guide

What are the algorithms to identify memory faults using m-BIST?

Algorithmic Optimizations: How to Leverage SIMD

The Union Find Algorithm

Grokking Algorithms: Quicksort

Ordered sets in GO with treaps

Evolving to Hypergraphs

Object Hypergraphs

领英推荐

Named Graphs and Hypergraphs

Getting Hyper

The Cagle Report

9,360 位关注者

Kurt Cagle的更多文章

A Brief History of Writing

Changing Knowledge Graph Properties and Other RDF Thoughts

Where's the Deep Fake Crisis?

An Imperfect Canvas ... But An Interesting First Step

Why Your Workers Aren't Going to Come Back to the Office ... Ever

Reification and Why It REALLY Matters

What is a Confabulator?

Some Linked-In Article Authoring Tips

Knowledge Graphs and Supply Chains

A Data Vector Primer for Ontologists

社区洞察

其他会员也浏览了

Projections with Ranges

Algorithms — Big O Notation

Understanding Big O Notation, Time and Space Complexity

Cell Plots

Binary Trees study guide

What are the algorithms to identify memory faults using m-BIST?

Algorithmic Optimizations: How to Leverage SIMD

The Union Find Algorithm

Grokking Algorithms: Quicksort

Ordered sets in GO with treaps