Ontology Modeling with SHACL: Getting Started
In the world of Knowledge Graphs, an Ontology is a domain model defining classes and properties. Classes are the types of entities (instances) in the graph and properties are the attributes and relationships between them. Ontologies define the structure of graphs and allow tools to make more sense of them.
This article introduces how SHACL can be used for Ontology Modeling. We are building a little toy example - representing Chess games.
This article covers the very basics of SHACL only, with typical design patterns that are ready for copy and paste when needed. Future articles will (hopefully) drill into more advanced topics, extending this Chess ontology.
Scoping an Ontology Project
The first step in any ontology modeling project is to decide on the scope and the potential application areas. What entities do we need to represent and what questions is the graph going to answer? Without this scope, an ontology may become too general, too specific or unfit for purpose for other reasons.
In our example, we want to be able to represent Chess games including the pieces on the board so that it can be used to answer queries such as
To support such use cases, it would be best to represent all individual pieces of a chess board as instances in a graph. A chronological list of moves in a game would be useful too but wouldn't allow us to answer the questions above easily.
Example Instance Data
Even if there is no ontology yet, it is worthwhile to think about the structure of the nodes that we would like to describe. In some cases this data may already exist. In our Chess example, let's assume we want to represent a game with this position:
In a (knowledge) graph this could be represented as:
In Turtle notation this little instance graph could be something like:
ex:ExampleGame
a shess:Game ;
shess:piece [
a shess:King ;
shess:color shess:White ;
shess:square "a1" ;
] ;
shess:piece [
a shess:Queen ;
shess:color shess:Black ;
shess:square "b2" ;
] ;
shess:piece [
a shess:King ;
shess:color shess:Black ;
shess:square "b3" ;
] .
Getting Started with an Ontology
To represent a chess game as above, we need classes such as Game and Piece, with various subclasses for the different kinds of pieces such as King and Queen. For each Piece we will need the Color (white or black) and the square (such as "a1" for the lower left corner). That is the minimum. Here is our initial design:
We can create a basic SHACL file, for example in Turtle notation using a text editor or an Ontology Editor such as TopBraid EDG. Below is an "empty" file to get started, just declaring some namespace prefixes. In our case, prefix.cc told us that "chess" was already taken, so we picked "shess" for this project.
@prefix dash: <https://datashapes.org/dash#> .
@prefix owl: <https://www.w3.org/2002/07/owl#> .
@prefix rdf: <https://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <https://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <https://www.w3.org/ns/shacl#> .
@prefix shess: <https://example.org/shess#> .
@prefix xsd: <https://www.w3.org/2001/XMLSchema#> .
<https://example.org/shess>
a owl:Ontology ;
rdfs:label "Chess in SHACL Example Ontology" ;
owl:imports <https://datashapes.org/dash> .
Note that here we import the dash namespace as it makes some things easier, but that is entirely optional.
Defining Classes
A class represents a type of instances in a graph. Here is a very basic definition of the class shess:Game which we will use to represent individual Chess games:
shess:Game
a rdfs:Class ;
rdfs:label "Chess Game" ;
rdfs:subClassOf owl:Thing .
So far this is just a class definition using the traditional RDF Schema type rdfs:Class. We could have picked owl:Class as well. Classes can be arranged in a subclass hierarchy, and here we have picked owl:Thing as the root class, basically saying that Game is a top-level class. We could have also omitted the rdfs:subClassOf statement or used rdfs:Resource. In any case, none of this makes our ontology depend on OWL or RDFS and you don't need further knowledge of these standards to proceed with SHACL.
The simple class definition above doesn't yet define any meaning for a SHACL-based tool. SHACL uses a more general concept than classes to talk about nodes, namely shapes. Shapes describe the structure of nodes in a graph, and the constraints that they need to fulfill. For example, a Game shape could express that each valid Chess game must have at least two pieces of type King on the board. Most SHACL shapes will be linked to a class, but they do not need to.
To declare classes and their shapes in SHACL, there are two main patterns.
Design Pattern 1: Classes and separate Node Shapes
In this design pattern, the class is a separate entity from its SHACL node shape, and they are linked together using the property sh:targetClass:
shess:Game
a rdfs:Class ;
rdfs:label "Chess Game" ;
rdfs:subClassOf owl:Thing .
shess:GameShape
a sh:NodeShape ;
rdfs:label "Chess Game Shape" ;
sh:targetClass shess:Game ;
sh:property ...
This declares a node shape shess:GameShape which defines the constraints that all instances of the given target class (shess:Game) need to conform to. This design pattern cleanly separates the SHACL definition from the RDFS or OWL definitions, and is sometimes a good choice when you are not the "owner" of the original class definitions, or want to use different shapes for the same classes for different use cases. The downside however is that you have to keep two separate entities in sync.
Design Pattern 2: Classes and Node Shapes combined
In this design pattern we use the same identifier for the class and its shape:
shess:Game
a rdfs:Class ;
a sh:NodeShape ;
rdfs:label "Chess Game" ;
rdfs:subClassOf owl:Thing .
sh:property ...
In this design, the shess:Game class is also an instance of sh:NodeShape. A class that is also a node shape means that the class may declare properties as SHACL property shapes, and that SHACL-aware tools will apply the semantics of SHACL to them. More technically, a class that is also a node shape uses the implicit class target pattern which means that the (property) constraints declared at the class apply to all instances of the class (and also the subclasses). In a sense, there is a hidden triple (shess:Game sh:targetClass shess:Game) in this model.
This design pattern has exactly the same semantics as Pattern 1 and is usually a good choice when you "own" the ontology and want a low-maintenance solution.
If you prefer an even more compact form of defining your classes, you could use dash:ShapeClass as follows:
shess:Game
a dash:ShapeClass ;
rdfs:label "Chess Game" ;
rdfs:subClassOf owl:Thing .
The class dash:ShapeClass is defined in the dash namespace that we have imported earlier, and is defined as a metaclass that is a subclass of both rdfs:Class and sh:NodeShape. It basically "activates" SHACL for your classes and requires just one type triple instead of two.
We will use this pattern moving forward, but you can elect to use rdfs:Class + sh:NodeShape if you don't want to rely on dash. Or you can use Design Pattern 1 with separate classes and shapes if you prefer.
With these options out of the way, we define a class to represent chess pieces:
领英推荐
shess:Piece
a dash:ShapeClass ;
dash:abstract true ;
rdfs:comment "The base class of the different Chess pieces." ;
rdfs:label "Piece" ;
rdfs:subClassOf owl:Thing .
This class is annotated to be abstract, which means that it should never be instantiated directly. Instead, the subclasses of shess:Piece, such as shess:King, will be used:
shess:King
a dash:ShapeClass ;
rdfs:label "King" ;
rdfs:subClassOf shess:Piece .
There are also Bishop, Knight, Pawn, Queen, Rook. They are all declared as subclasses of Piece, meaning that any properties and constraints defined at Piece also apply to these subclasses. As a result, SHACL implements an inheritance system similar to object-oriented systems.
We are introducing these subclasses also because we expect that different rules will apply to the different piece types. For example, they can all perform different moves, they look differently and have different value scores. Even if such differences are not critical in the first iteration of our model, it pays off to think ahead so that we don't need to change all instance data later.
Enumerations and Constants
While ontologies are usually focusing on classes and properties, sometimes they also contain instances. A typical use case is to represent the two piece colors in chess, so that all knowledge graphs can point at the same shared identifiers:
shess:Color
a dash:ShapeClass ;
rdfs:label "Color" ;
rdfs:subClassOf owl:Thing ;
sh:in (
shess:White
shess:Black
) .
shess:Black
a shess:Color ;
rdfs:label "Black" .
shess:White
a shess:Color ;
rdfs:label "White" .
Here we have made the design choice to store the colors as instances of a class shess:Color. Alternatively we could have just stored the colors of each piece using strings such as "white" and "black". But by using dedicated instances with URIs, we can attach other information. For example we could declare additional labels in other languages so that applications can switch between display languages more easily. It is usually more future-proof to use URIs over literals for such concepts - "Things over Strings".
Note that we have used our first SHACL constraint above: the sh:in construct states that White and Black are the only instances of the Color class.
Simple Relationships
Relationships are edges in a knowledge graph. For example, each instance of Piece is linked to exactly one instance of Color. In SHACL, we use so-called property shapes to declare such relationships:
shess:Piece
a dash:ShapeClass ;
...
sh:property shess:Piece-color .
shess:Piece-color
a sh:PropertyShape ;
sh:path shess:color ;
sh:name "color" ;
sh:class shess:Color ;
sh:minCount 1 ;
sh:maxCount 1 .
Above we have declared that instances of Piece have exactly one value for the property shess:color and that the value must be an instance of Color.
Such property shapes must declare a sh:path but everything else is optional. The path must be either a named URI or a path expression such as an inverse path. The example above uses some of the most widely-used constraint types of SHACL:
Finally, property shapes may carry values for non-validating properties such as sh:name which represents display labels for the relationship.
There are many other kinds of constraints in SHACL that you could use here, including the Core Constraint Components but also some complex conditions expressed in SPARQL. We'll get to those some time later.
Bi-Directional Relationships
By definition, the relationships in an RDF graphs can be navigated in both directions. This is so because relationships are stored in tables of triples of the form (subject, predicate, object) and you can either start at the subject and find all objects or start at the object and find all subjects. SPARQL makes querying both directions symmetric. However, sometimes you need to express constraints on both ends of a relationship. This is where inverse properties come into play.
The following property shape states that each Game has two or more instances of Piece as values of the shess:piece relationship:
shess:Game
a dash:ShapeClass ;
...
sh:property shess:Game-piece .
shess:Game-piece
a sh:PropertyShape ;
sh:path shess:piece ;
sh:name "pieces" ;
sh:class shess:Piece ;
sh:minCount 2 . # At least two kings
While the above defines the relationship viewed from a Game, here is the opposite direction as viewed from a Piece:
shess:Piece
a dash:ShapeClass ;
...
sh:property shess:Piece-piece-inverse .
shess:Piece-piece-inverse
a sh:PropertyShape ;
sh:path [
sh:inversePath shess:piece ;
] ;
sh:name "game" ;
sh:class shess:Game ;
sh:minCount 1 ;
sh:maxCount 1 .
The key difference is the use of the sh:inversePath, which tells a SHACL processor that it needs to walk the relationship from object to subject. Taken together, we have now stated that each Game must have at least two Pieces and that each Piece is part of exactly one Game.
There are other, more complex types of paths in SHACL when you need them.
Attributes
As the last step of this little exercise, we declare an attribute. While relationships link two nodes together, attributes link a node with a datatype literal such as a string, a number or a boolean.
In SHACL, the syntax for defining attributes is very similar to relationships.
In the following snippet we declare that instances of Piece must have exactly one value for the property shess:square and that value needs to be a string that matches a given regular expression:
shess:Piece
a dash:ShapeClass ;
...
sh:property shess:Piece-square .
shess:Piece-square
a sh:PropertyShape ;
sh:path shess:square ;
sh:name "square" ;
sh:description "For example a1 for the lower left corner." ;
sh:datatype xsd:string ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:pattern "^[a-h][1-8]$" .
For this design, we chose to represent the position of each piece with the Chess notation of its square. For example the value "a1" is the lower left corner and "a2" is the square just above that. In an alternative design we could have chosen integers for x and y but in this case the Chess notation is arguably more natural and more compact.
To represent these syntax rules, we have used these SHACL constraints:
There are plenty of other constraint types that could apply here, such as sh:minLength and sh:maxLength but in this case the sh:pattern is sufficient to describe the allowed values.
A word on datatypes: While RDF defines a huge choice of datatypes, in practice I cannot recommend using the exotic numeric datatypes such as xsd:nonNegativeInteger. Instead we only use the basic datatypes xsd:integer and xsd:decimal, and use numeric range constraints such as sh:minInclusive if we want to narrow them down. Representing range constraints through the choice of datatypes is a relic from the past where byte-sized storage still mattered.
Summary and Outlook
This simple example has introduced some basic building blocks to get you started with ontology modelling in SHACL. The ontology so far only declares the properties that are needed to represent the state of a chess board, without adding much detail. Not all requirements from our scope are covered yet, but at least we can start collecting instance data, which also allows us to validate whether the ontology is suitable. It is perfectly normal that an ontology changes over time and that we didn't get everything right in our first attempt.
I plan to write more articles that expand on the Chess example to explain things like qualified cardinality constraints, user-defined metaclasses, SPARQL-based constraints, how to drive user interfaces and other goodies, so stay tuned.
Appendix
Here is the complete source code of the Chess Ontology from this article:
@prefix dash: <https://datashapes.org/dash#> .
@prefix owl: <https://www.w3.org/2002/07/owl#> .
@prefix rdf: <https://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <https://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <https://www.w3.org/ns/shacl#> .
@prefix shess: <https://example.org/shess#> .
@prefix xsd: <https://www.w3.org/2001/XMLSchema#> .
<https://example.org/shess>
a owl:Ontology ;
rdfs:label "Chess in SHACL Example Ontology" ;
owl:imports <https://datashapes.org/dash> .
shess:Color
a dash:ShapeClass ;
rdfs:label "Color" ;
rdfs:subClassOf owl:Thing ;
sh:in (
shess:White
shess:Black
) .
shess:Black
a shess:Color ;
rdfs:label "Black" .
shess:White
a shess:Color ;
rdfs:label "White" .
shess:Game
a dash:ShapeClass ;
rdfs:label "Chess Game" ;
rdfs:subClassOf owl:Thing ;
sh:property shess:Game-piece .
shess:Game-piece
a sh:PropertyShape ;
sh:path shess:piece ;
sh:class shess:Piece ;
sh:minCount 2 ;
sh:name "pieces" .
shess:Piece
a dash:ShapeClass ;
dash:abstract true ;
rdfs:comment "The base class of the different Chess pieces." ;
rdfs:label "Piece" ;
rdfs:subClassOf owl:Thing ;
sh:property shess:Piece-color ;
sh:property shess:Piece-piece-inverse ;
sh:property shess:Piece-square ;
sh:property shess:Piece-type .
shess:Piece-color
a sh:PropertyShape ;
sh:path shess:color ;
sh:class shess:Color ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:name "color" .
shess:Piece-piece-inverse
a sh:PropertyShape ;
sh:path [
sh:inversePath shess:piece ;
] ;
sh:class shess:Game ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:name "game" .
shess:Piece-square
a sh:PropertyShape ;
sh:path shess:square ;
sh:datatype xsd:string ;
sh:description "For example a1 for the lower left corner." ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:name "square" ;
sh:pattern "^[a-h][1-8]$" .
shess:Piece-type
a sh:PropertyShape ;
sh:path rdf:type ;
sh:in (
shess:Bishop
shess:King
shess:Knight
shess:Pawn
shess:Queen
shess:Rook
) ;
sh:minCount 1 ;
sh:maxCount 1 ;
sh:name "type" .
shess:Bishop
a dash:ShapeClass ;
rdfs:label "Bishop" ;
rdfs:subClassOf shess:Piece .
shess:King
a dash:ShapeClass ;
rdfs:label "King" ;
rdfs:subClassOf shess:Piece .
shess:Knight
a dash:ShapeClass ;
rdfs:label "Knight" ;
rdfs:subClassOf shess:Piece .
shess:Pawn
a dash:ShapeClass ;
rdfs:label "Pawn" ;
rdfs:subClassOf shess:Piece .
shess:Queen
a dash:ShapeClass ;
rdfs:label "Queen" ;
rdfs:subClassOf shess:Piece .
shess:Rook
a dash:ShapeClass ;
rdfs:label "Rook" ;
rdfs:subClassOf shess:Piece .
Director en Semanticon Ltd
6 个月Holger Knublauch - It would be super nice to read your take on modeling using OWL Ontologies vs. SHACL Shapes. What do your think would be the pros and cons of each approach? And also, which scenarios do you think each one is more suitable? Anyway, many thanks for this lovely article!
Principal Software Architect, Data Services, at FactSet
11 个月What is the type of shess:color (lowercase c)? It gets referenced but not defined from what I can tell.
Founder of Qworum.net PaaS???????????????????
1 年Circular definition: the shess:Color definition depends on shess:Black and shess:White, which in turn depend on shess:Color. Is this allowed? And what would be the best of making this non-circular?
Computers, Chemistry, Biology, Humans, Cats and everything in between.
1 年I'm deeply perturbed by the positions of the pawns on that image.
Web de Données · Knowledge Graphs · Ontologies · sparna.fr
1 年SHACL is the new OWL. It deserves more attention. This is another nice use-case to view in SHACL Play : see how the chess example modelling from the article is rendered at https://shacl-play.sparna.fr/play/doc?format=html&url=https%3A%2F%2Fgist.githubusercontent.com%2Ftfrancart%2F66c30ff6d33aa62bafa299e2f1bf7ae8%2Fraw%2Fc87381fd3dd9c0ecd88b7d6b4b110bfbcd96cd59%2Fchass-shapes.ttl&includeDiagram=true - the only adaptation that is required is the replacement of "dash:ShapeClass" by "sh:NodeShape, rdfs:Class".