Hypergraphs and RDF
The topic of hypergraphs has come up of late in a couple of different discussion groups, and it seems like a good opportunity to explore what they are, and how they fit (or don’t) in the broader RDF ecosystem.
Evolving to Hypergraphs
In an ordinary graph, you have nodes and edges, with each node connected to another node along an edge.
In Turtle, this is expressed in about as simple a manner as you can imagine:
:s :p :o .
where s represents a subject node, p a predicate edge, and o, an object node.
Let’s say that you have three triples where you have the same subject and predicate but different objects:
It is also possible to have multiple object nodes associated with the same subject and predicate:
:s :p :o1 .
:s :p :o2 .
:s :p :o3 .
However, it’s worth noting here that while RDF here treats them as separate assertions, Turtle does not.
This is an interesting observation, because it suggests that Turtle is more than just a syntactical sugar - it seems to suggest that RDF has the ability to have a set of objects associated with a triple. At least in the direction from subject to object, Turtle is describing what’s known as a hypergraph.
Loosely, a hypergraph can be thought of as a graph in which a single node can, through its edge, be associated with a set of nodes. In the case of Turtle, this expression is indicated through the simple sequence:
:s :p :o1, :o2 :o3 .
Now, I want you to focus on that small square in the middle of the diagram. This is the bridge that connects the world of Turtle (and hypergraphs) with the world of rdf. What this is actually saying is that there is construct that consists of a simple bag of nodes, and what is actually being pointed to is this construct. Put another way, the above statement could decompose as follows in Turtle:
:s :p _:b1 .
_:b1 rdfs:member o1 .
_:b1 rdfs:member o2 .
_:b1 rdfs:member o3 .
So arguably, RDF isn’t really a hypergraph … it’s a trick! Well, perhaps.
A set is a thing, consisting of a collection of other things. In other words, if you look upon the blank node discussed here as an object, then what this notation actually says is that _:b1 is a set, whose members are o1, o2, and o3. This is pretty much the definition of a hypergraph.
Now, this covers the object direction, but the subject direction is not quite as forgiving. For instance, suppose that I wanted to have multiple subjects sharing the same object, as in:
Once again, you are passing a set as an argument. The turtle here is roughly symmetric to the object case:
:s1 rdfs:member _:b2 .
:s2 rdfs:member _:b2 .
:s3 rdfs:member _:b2 .
_:b2 :p :o .
Now, it would be great if you could just say:
:s1, :s2, :s3 :p :o . # Note: DOES NOT WORK
however, you can’t, primarily because this pattern wasn’t included in the Turtle parser (it likely breaks other things in Turtle as well). There are also no specialized markers for delineating a set in Turtle. Thus, at first glance, the idea of creating a true hypergraph goes out the window.
However, not all hope is lost. There is another structure within Turtle that can be delineated: the RDF List. RDF Lists are pretty ugly looking things:
This is about the simplest implementation of a linked list that you can create in a graph. However, it’s complicated enough that Turtle does have a notation for it, of the form:
( :item1 :item2 :item3 )
This notation has two advantages. First, it preserves the order of the list, something that a set doesn’t generally do. Additionally, the first blank node in the linked list (_:b1 here) serves as a pointer to the list. This means that by passing _:b1 you are passing what amounts to a list or array of items by reference.
This has very intriguing implications. Among other things, it means that you can pass such an array as the subject, meaning that once again Turtle is able to describe a hypergraph, this time on the subject node:
( :s1 :s2 :s3) :p :o .
You could pass it on the object node in the same way, of course:
:s :p ( :o1 :o2 :o3 ) .
You could even do it with both:
( :s1 :s2 :s3 ) :p ( :o1 :o2 :o3 ) .
This means that the “conceptual graph” that Turtle represents is a hypergraph, even if the underlying RDF implementation is not.
So what use is this? At it’s simplest, the fact that you can pass lists means that you can process lists in an explicit order. Let’s say you have the following Turtle code:
@prefix Book: <uri:urn:Book#> .
@prefix Chapter: <uri:urn:Chapter#> .
@prefix IPWork: <uri:urn:IPWork#> .
Chapter:Conclusion a Chapter: ;
IPWork:hasTitle "Conclusion"^^xsd:string;
.
Chapter:WhatIsOntology a Chapter: ;
IPWork:hasTitle "What is Ontology?"^^xsd:string ;
.
Chapter:Introduction a Chapter: ;
IPWork:hasTitle "Introduction?"^^xsd:string ;
.
Chapter:TurtleAndTuppence a Chapter: ;
IPWork:hasTitle "Turtle and Tuppence?"^^xsd:string ;
.
Book:OntologyALaymansGuide a Book: ;
Book:hasChapters (
Chapter:Introduction
Chapter:WhatIsOntology
Chapter:TurtleAndTuppence
Chapter:Conclusion
) ;
IPWork:hasTitle "Ontology: A Layman's Guide"^^xsd:string ;
.
A SPARQL query can then use this information to create a listing of the chapters of the book in the correct order:
prefix Book: <uri:urn:Book#>
prefix Chapter: <uri:urn:Chapter#>
prefix IPWork: <uri:urn:IPWork#>
prefix rdf: <https://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?bookTitle ?chapterTitle WHERE {
?book IPWork:hasTitle ?bookTitle .
?book Book:hasChapters ?chapters .
?chapters rdf:rest*/rdf:first ?chapter .
?chapter IPWork:hasTitle ?chapterTitle .
}
This will generate a table with the chapters in the specified order.
The expression
?chapters rdf:rest*/rdf:first ?chapter .
illustrates walking a list, with the rdf:rest using transitive closure (walking along the same predicates in a path until the subject node no longer has the rdf:rest property), extracting for that node the relevant first item or ignoring it, if it’s reached rdf:nil.
The upshot of this is that while RDF by itself is not a hypergraph, the abstractions brought about with Turtle can make it a hypergraph easily enough.
Object Hypergraphs
In a relational database, a given table can have only one foreign key to another table per column. You can think of each foreign key per column as being an outbound link. for the concept embodied by the resource identified by the row.
In one particular project I was involved in, countries could be qualified by country groups that described various characteristics of the countries. A simple example of this can be seen in the diagram below:
领英推荐
This corresponds to a Turtle file that looks (in part) something like this:
Country:Switzerland Country:inCountryGroup
CountryGroup:European,CountryGroup:Landlocked.
Country:Japan Country:inCountryGroup
CountryGroup:Asian,CountryGroup:Landlocked.
This is a hypergraph on the object side, because each country has more than one foreign key pointer. This is also a common design pattern where you have multiple tags that can be associated with a given concept. It should be noted, though, that if these tags are enumerations, such as the ones given above, there may, in fact, be some kind of underlying structure that is hidden in this model. For instance, the same information may also be stated as follows:
This, in turn, changes the Turtle, though only by normalizing two subclasses and creating two sub-properties:
Country:Switzerland
Country:inContinent Continent:European;
Country:hasCoastline Coastline:Landlocked.
Country:Japan
Country:inContinent Continent:Asia;
Country:hasCoastline Coastline:Island.
Country:inContinent rdfs:subPropertyOf Country:inCountryGroup.
Country:hasCoastline rdfs:subPropertyOf Country:inCountryGroup.
Put another way, you can achieve many of the same effects that hypergraphs give you by combining good design and working with reference pointers, as the book and chapter’s example in the previous section illustrates. Once you understand this point, hypergraphs aren’t all that scary, though they can help you push the boundaries of what you’re attempting to model.
I encountered this particular problem once when attempting to build a model for relating cars being sold to custom features for that car. This list could be quite extensive, often with dozens of such features for a given car, and my first inclination - to simply create a single list of customizations, very quickly became unworkable from a UI standpoint (replace inCountryGroup above with hasCustomization, and you get an idea of what this looked like). However, by using cascading categories (engineType, seatCoverType, etc.), I replaced a single large block with a number of properties. By creating a pointer to a list of such properties, I could then put all the customizations on a separate page.
There are two different ways to model this, as shown below:
The bag approach is shown on the left and can be considered a bag (or set) of lists. In this case, the square-edged boxes represent blank nodes, while the rounded-edged boxes are IRI nodes. The list approach, on the right, is typically of a hypergraph, in which you have a list of (here, category) nodes without an attending structure. Which is preferable? My own experience has been the bag approach, largely because there is less in the way of computation needed to determine grouping structures.
You may also be questioning what happens when you have mutually exclusive options. The engine type is an example of this. You could have an electric car, an internal combustion engine (ICE), a rotary engine, a fuel cell, or a hybrid engine, but you usually don’t have more than one (I consider a hybrid to be a different engine type altogether, as it represents a distinct architecture even though it has aspects of both electric and ICE).
This is what makes lists so tricky. A list can be considered a pseudo-element, requiring a certain amount of additional processing and conditional logic when querying, but can be thought of as being of the same type as its constituent items. what changes is the shape of the connection? In SHACL, this would look something like:
PartType:hasEngineType a sh:PropertyShape;
rdfs:subPropertyOf Vehicle:hasPartType ;
sh:name "Engine Type"^^xsd:string;
sh:path PartType:hasEngineType ;
sh:class EngineType:;
sh:nodeKind sh:IRI ;
sh:minCount 1;
sh:maxCount 1;
.
PartType:hasSeatCover a sh:PropertyShape;
rdfs:subPropertyOf Vehicle:hasPartTypes ;
sh:name "Seat Cover"^^xsd:string;
sh:path [rdf:rest*/rdf:first] ;
sh:class SeatCover:;
sh:nodeKind sh:IRI;
sh:minCount 0;
.
In this case, while the assumption in both cases is that the object type of the predicate is some form of PartType:, itself a subclass of Category: . The paths to get to that class are different. This suggests (correctly) that the generic (abstract) Vehicle:hasPartTypes property has neither path nor cardinality:
Vehicle:hasPartTypes a sh:PropertyShape;
sh:name "PartTypes"^^xsd:string;
sh:class PartType:;
sh:nodeKind sh:IRI;
.
So what does this have to do with hypergraphs? Primarily hypergraphs are the association of data structures with relationships. Let’s take the bag approach as an example:
Vehicle:MyCar Vehicle:hasPartTypes [
PartType:hasEngineType EngineType:Electric;
PartType:hasSeatCover (SeatCover:Leather);
PartType:hasAudioProtocol (AudioProtocol:Blutooth AudioProtocol:USB)
];
.
Named Graphs and Hypergraphs
Okay, you might be saying, “that makes sense for the object side of the equation, but what about the subject side?” (Of course, you may also be saying, "Gee, this article is too long; time for lunch”. It is Always hard to tell when you’re on this side of the blog.)
We use subject hypergraphs all the time. A graph is a set of assertions. You can refer to a set by name (the graph name), but what you’re actually talking about is the set of assertions:
This way, the part types are contained in a graph Graph:MyCar_PartTypes that uses the Category:isA property to identify the respective resources as being of the PartType: class.
Vehicle:MyCar Vehicle:hasPartTypes Graph:MyCar_PartTypes.
graph Graph:MyCar_PartTypes {
EngineType:Electric Category:isA PartType: .
SeatCoverType:Leather Category:isA PartType: .
AudioProtocol:Blutooth Category:isA PartType: .
AudioProtocol:USB Category:isA PartType: .
}
Graph:MyCar_PartTypes
Concept:createdBy Person:JaneDoe ;
Concept:creationDate "2024-01-24"^xsd:date ;
.
Here, Graph:MyCarPartTypes is a pointer to a set of triples. Note that this graph is referenced as both an object:
Vehicle:MyCar Vehicle:hasPartTypes Graph:MyCar_PartTypes.
and as a subject:
Graph:MyCar_PartTypes
Concept:createdBy Person:JaneDoe ;
Concept:creationDate "2024-01-24"^xsd:date ;
.
The following indicates how you could then get the relevant part types from the graph in SPARQL, along with the author and creation date:
SELECT ?partType ?author ?creationDate WHERE {
values ?vehicle {Vehicle:MyCar}
?vehicle Vehicle:hasPartTypes ?partTypes .
?partTypes {?partType Category:isA PartType: } .
?partTypes Concept:createdBy ?author .
?partTypes Concept:createdBy ?creationDate .
}
For the above data, this produces the table:
One additional point to consider. You may likely run into a situation where all possible categories and their definitions are kept in a graph called something like Graph:Categories . Thus, if the entry for EngineType:Electric includes a property called PartType:hasPTIN (Part Type InventoryNumber), like as follows:
Graph:Categories {
EngineType:Electric
Category:isA PartType:EngineType ;
PartType:hasPTIN "ENG-ELECTRIC"^^xsd:string ;
.
EngineType:ICE
Category:isA PartType:EngineType ;
PartType:hasPTIN "ENG-ICE"^^xsd:string ;
.
EngineType:Rotary
Category:isA PartType:EngineType ;
PartType:hasPTIN "ENG-ROTARY"^^xsd:string ;
.
AudioProtocol:USB
Category:isA PartType:AudioProtocol ;
PartType:hasPTIN "ENG-ROTARY"^^xsd:string ;
# ...
}
Then the SPARQL query to get the PTIN for that particular part type would look like the following:
SELECT ?partType ?ptin WHERE {
values ?vehicle {Vehicle:MyCar}
?vehicle Vehicle:hasPartTypes ?partTypes .
?partTypes {?partType Category:isA PartType: } .
Graph:Categories {?partType PartType:hasPTIN ?ptin}
}
In other words, the graph Graph:MyCar_PartTypes holds the part types that the individual vehicle “MyCar” has as declarations, but the definition for those part types (and all of the others) is kept in Graph:Categories. This produces the table:
Turtle by itself does not have named graphs, however TRIG (Turtle RDF Graph Language) does. With the design patterns given above, TRIG crosses the threshold to being a hypergraph, because a graph is by definition a set (in this case a set of triples). What’s even more significant here is that while there is no easy way to identify an RDF List except via blank nodes (which are by definition not referenceable), named graphs are, perforce, named.
Getting Hyper
TRIG, Sparql 1.1 Query, and SPARQL 1.1 Update, which all hit the stage around the end of 2013, changed the nature of the game concerning named graphs, and hence, hypergraphs. Until then, everything within a triple store was effectively one giant graph. Even after these specs were published, it took a few years for named graphs to make their way into most triple stores, and many older ontologists were simply not trained on them and tended to ignore their implications. OWL, for instance, does not recognize named graphs as a formal concept, although there are OWL implementations that have been extended to do so.
SHACL code, the Shape Constraint Language encoded as triples, is typically stored in a named graph in the system, and one can pass the named graph URI as an argument to any processing code that needs to validate against that SHACL. Indeed, a typical SHACL validation process involves two graphs - the source graph that contains the nodes to be validated and the SHACL graphs that perform the validation.
Indeed, in theory, it should be possible to run a SPARQL UPDATE script that will identify, validate, and, if errors occur, generate reports on those nodes that do not satisfy the SPARQL constraints. In practice, that’s considerably more complicated to write such a script by hand. However, it should be possible to write an update script that, when given a SHACL graph, will generate a SPARQL update file that can, in turn, perform the requisite validation and reporting, including inline SPARQL.
Since SHACL also incorporates the ability to create constraint logic and define functions, this also means that SPARQL UPDATE and SHACL together, as hypergraphs, can generate self-modifying functional code. This is not the place to discuss the how of this, but it may be the subject of a subsequent post.
This is a form of generative hypergraph that can use graphs as objects, as endpoints for processes, and as tools for generating other content. As with most generative technologies, it may likely take advantage not only of tools like SPARQL and SPARQL UPDATE but also of AI-generated graphs and scripts, most likely instantiated through a SERVICE interface.
Additionally, SHACL can be used by reference to get relevant properties like OWL that don’t need complex pre-processing. This, too, is fodder for another post.
We’ve covered a lot of ground here. RDF Lists and bags provide a rudimentary hypergraph capability, though using named graphs expands that capability tremendously. This also does not consider perhaps the biggest aspect of hypergraphs - using a named graph or pointer to a structure as a predicate. That’s where things get very interesting indeed.
In media res,
Kurt Cagle
Kurt Cagle is the Editor of The Ontologist and is a practising ontologist, solutions architect, and AI aficionado. He lives in Bellevue, Washington, with his wife and cats.
Sign up for free Ontology Office Hours at Calendly .
Engineer | Writer | Machine Learning | Deep Learning | Knowledge Graph | RAG
3 个月may be query performance might be impacted by the additional level of indirection (hyper edge nodes) compared to direct relationships in graphs built using neo4j if this concept is applied...Will it be more impactful than pairwise relationships ?
Visionary technologist and lateral thinker driving market value in regulated, complex ecosystems. Open to leadership roles.
10 个月Mariam Brian check this POV.
The Data Diva | Data Privacy & Emerging Technologies Advisor | Technologist | Keynote Speaker | Helping Companies Make Data Privacy and Business Advantage | Advisor | Futurist | #1 Data Privacy Podcast Host | Polymath
10 个月Looking forward to reading more about it in The Cagle Report. Keep up the great work!
Disambiguation Specialist
10 个月Kurt Cagle - My understanding of hypergraphs differs quite a bit from yours, I think. Sets seem to me to be a special case where one concept, say 'colour' has several items that qualify as members of that set. Not sure I understand how a set becomes a hypergraph. To me, hypergraphs are a way to model persistent and therefore reusable patterns of association (OOP called them Classes), the instances / objects of which have their own identity. My original comment suggests that hypergraphs are a way to marry principles of object orientation with knowledge graphs. See graphic. Happy to be course corrected!
Holistic Management Analysis and Knowledge Representation (Ontology, Taxonomy, Knowledge Graph, Thesaurus/Translator) for Enterprise Architecture, Business Architecture, Zero Trust, Supply Chain, and ML/AI foundation.
10 个月So, next - hyperedges!