Ontology Modeling with SHACL: SPARQL-based Constraints
In this part of our SHACL tutorial we will show how to express complex conditions with the help of the RDF query language SPARQL. The article builds upon the first part Getting Started in which we have introduced a simple ontology to represent the state of a Chess game and its pieces.
In addition to declaring the structure of entities in a knowledge graph (e.g. chess pieces), SHACL can be used to define constraints such as "There must always be exactly two kings on a Chess board: a white king and a black king" from the previous article on Qualified Cardinality Constraints. In general, it is ideal if the purely declarative shapes of SHACL (Core) are sufficient to express such constraints. But there are scenarios where even more complex conditions need to be expressed that do not fit into shapes alone. For many of these use cases, SHACL-SPARQL is a good choice.
Use Case
The example constraint that we will develop here represents one of the implicit rules of Chess:
The two kings cannot be placed on adjacent squares
For those who are not familiar with Chess, this is because kings cannot move into a position where they are under attack, and kings can attack all 8 squares around them. So in the following position, the black king could never have been moved next to the white king without running into a check:
How to express this as a constraint?
There MAY be a way to model this with pure SHACL Core but I believe it would require a very complex ontology where all pieces would need explicit links to the pieces that they attack. In such cases it is better to use a richer constraint language than to artificially bloat the ontology. Therefore: SPARQL to the rescue!
Playing with SPARQL
As this is a surprisingly complex condition to express even with SPARQL, let's try to break the problem down. One algorithm would be to
Now let's write some SPARQL queries to collect the information we need. Here is the query for step 1 above:
# Query to find the squares of the two kings for $this Game
SELECT ?ws ?bs
WHERE {
$this shess:piece ?whiteKing .
?whiteKing a shess:King .
?whiteKing shess:color shess:White .
?whiteKing shess:square ?ws .
$this shess:piece ?blackKing .
?blackKing a shess:King .
?blackKing shess:color shess:Black .
?blackKing shess:square ?bs .
}
For step 2, given a square such as "a1" here is a query to produce (x, y) coordinates that are easier to process in maths.
# Query to convert Chess square string "a1" to x,y integers
SELECT ?x ?y
WHERE {
BIND ("a1" AS ?s) .
BIND (xsd:integer(SUBSTR(?s, 2, 1)) AS ?y) .
BIND (SUBSTR(?ws, 1, 1) AS ?l) .
BIND (STRLEN(STRBEFORE("abcdefgh", ?l)) + 1 AS ?x) .
}
Note that this is a rather ugly query as, to my knowledge, SPARQL does not have an easy way to turn characters into their ASCII code, nor a built-in function to find the position of a character in a string. Thus the hack with STRLEN/STRBEFORE.
With these problems solved, we can now write a single SPARQL query that delivers all squares ?ws and ?bs where ?ws contains a white king and ?bs the black king, when they are both on adjacent squares:
Once we are happy with the query and have tested it with some examples and counter-examples, we can turn it into a proper SHACL constraint.
SPARQL-based Constraints
The official SHACL specification consists of the SHACL Core features (which are now implemented by basically all RDF triple stores) and the SHACL-SPARQL features. One of the SPARQL-related features is SPARQL-based Constraints.
Below is an example declaration of the constraint that the two kings must not be on adjacent squares. We have attached it to the class/shape shess:Game using the property sh:sparql:
shess:Game
...
sh:sparql shess:KingsCannotAttackEachOtherConstraint .
shess:KingsCannotAttackEachOtherConstraint
a sh:SPARQLConstraint ;
rdfs:label "Kings cannot attack each other constraint" ;
sh:message "King on {?ws} cannot attack King on {?bs}" ;
sh:select """
PREFIX shess: <https://example.org/shess#>
SELECT $this ?ws ?bs
WHERE {
$this shess:piece ?whiteKing .
?whiteKing a shess:King .
?whiteKing shess:color shess:White .
?whiteKing shess:square ?ws .
$this shess:piece ?blackKing .
?blackKing a shess:King .
?blackKing shess:color shess:Black .
?blackKing shess:square ?bs .
BIND (xsd:integer(SUBSTR(?ws, 2, 1)) AS ?wy) .
BIND (SUBSTR(?ws, 1, 1) AS ?wl) .
BIND (STRLEN(STRBEFORE("abcdefgh", ?wl)) + 1 AS ?wx) .
BIND (xsd:integer(SUBSTR(?bs, 2, 1)) AS ?by) .
BIND (SUBSTR(?bs, 1, 1) AS ?bl) .
BIND (STRLEN(STRBEFORE("abcdefgh", ?bl)) + 1 AS ?bx) .
FILTER (?bx >= ?wx-1 && ?bx <= ?wx+1 &&
?by >= ?wy-1 && ?by <= ?wy+1) .
}
""" .
As you can see, the query is basically the same as from the screenshot above. We only needed to also make sure that the query returns $this in addition to the squares ?ws and ?bs. During validation, the SELECT query will be executed for each instance of shess:Game, which can be used in the query with the variable $this. Then, for every result row, one constraint violation will be reported. In other words, you need to formulate a SPARQL query that finds the negative cases where the condition is violated.
The constraint violations will contain the sh:message where the variables ?ws and ?bs can be used to produce helpful output:
领英推荐
So: Mission Accomplished... although for my taste the query looks rather complex. Let's see how this complexity can be reduced.
User-Defined SPARQL Functions
The SPARQL language has a built-in extension point where engines can define new functions in addition to the built-in functions. Most SPARQL engines have their own extension functions, but there is no widely accepted mechanism to make these functions interoperable.
The SHACL Advanced Features specification has introduced a mechanism that allows anyone to declare new SPARQL functions and to distribute them like linked data, in RDF. Note that few SHACL (or SPARQL) engines currently support them. The open-source TopBraid SHACL API does support them, and the feature is widely used by customers of the TopBraid EDG enterprise platform.
Here are some user-defined SPARQL functions that will make writing SHACL constraints much easier, because they encapsulate reusable query logic.
The function shess:getKingSquare takes a Game and a Color as arguments and returns the square of the King with the given color:
shess:getKingSquare
a sh:SPARQLFunction ;
rdfs:label "get King square" ;
sh:parameter [
sh:path shess:game ;
sh:class shess:Game ;
sh:order 0 ;
] ;
sh:parameter [
sh:path shess:color ;
sh:class shess:Color ;
sh:order 1 ;
] ;
sh:returnType xsd:string ;
sh:select """
PREFIX shess: <https://example.org/shess#>
SELECT ?s
WHERE {
$game shess:piece ?king .
?king a shess:King .
?king shess:color $color .
?king shess:square ?s .
}""" .
The function shess:getSquareX converts a square string such as "a1" into its X position as an xsd:integer between 1 and 8:
shess:getSquareX
a sh:SPARQLFunction ;
rdfs:label "get square X" ;
sh:parameter [
sh:path shess:square ;
sh:datatype xsd:string ;
] ;
sh:returnType xsd:integer ;
sh:select """
SELECT ?x
WHERE {
BIND (SUBSTR(?square, 1, 1) AS ?letter) .
BIND (STRLEN(STRBEFORE("abcdefgh", ?letter)) + 1 AS ?x)
}""" .
Finally, the function shess:getSquareY extracts the digit from a square such as "a1" and converts it into an xsd:integer such as 1:
shess:getSquareY
a sh:SPARQLFunction ;
rdfs:label "get square Y" ;
sh:parameter [
sh:path shess:square ;
sh:datatype xsd:string ;
] ;
sh:returnType xsd:integer ;
sh:select """
PREFIX xsd: <https://www.w3.org/2001/XMLSchema#>
SELECT ?y
WHERE {
BIND (xsd:integer(SUBSTR(?square, 2, 1)) AS ?y) .
}""" .
With these helper functions, we can now significantly simplify the constraint:
SELECT $this ?ws ?bs
WHERE {
BIND (shess:getKingSquare($this, shess:White) AS ?ws) .
BIND (shess:getKingSquare($this, shess:Black) AS ?bs) .
BIND (shess:getSquareY(?ws) AS ?wy) .
BIND (shess:getSquareX(?ws) AS ?wx) .
BIND (shess:getSquareY(?bs) AS ?by) .
BIND (shess:getSquareX(?bs) AS ?bx) .
FILTER (?bx >= ?wx-1 && ?bx <= ?wx+1 &&
?by >= ?wy-1 && ?by <= ?wy+1) .
}
Not only is this particular constraint now much shorter, we can also reuse the same business logic in other scenarios. For example we could use the functions to compute all squares that are reachable by a given King. User-defined SPARQL functions are like lego bricks or stored procedures.
Summary and Outlook
SHACL-SPARQL constraints can be used to express complex conditions. Thus, any condition that can be captured as a SPARQL query can be attached to SHACL shapes for validation purposes. We have shown one particular (complex) use case but there are countless others. We have then also shown how user-defined SPARQL functions can improve modularity of your constraints and queries.
There are several other SPARQL-based features in SHACL that we didn't cover in this article. For example there are SPARQL-based targets that can be used to fine-tune which constraints apply to which nodes. Even more importantly, there are SPARQL-based Constraint Components that are a mechanism to extend SHACL itself by introducing new constraint types backed by SPARQL queries. This topic will be covered in the next article.
Even with SPARQL there are limitations. While we were able to (somehow) express what we needed for this article within the official SPARQL standard, it would arguably have been more natural to express the condition in an even richer language that offers better string processing features. In our product we have added the ability to express constraints and other ontology features using the Active Data Shapes (ADS) JavaScript framework. This offers basically unlimited expressiveness while retaining the declarative nature of RDF-based ontologies. We will likely write more about this in the future. Meanwhile, the previous articles on sh:values and the software company knowledge graph had some ADS examples.
Appendix: Prefix Declarations
One final detail on the SHACL syntax: How to declare namespace prefixes so that they do not need to be repeated in each query. We keep getting questions about this and the specification about this is poorly written (yes, I know).
Here is the thing: namespace prefixes are not part of the RDF data model but rather live in the serializations such as Turtle and SPARQL only. For a SHACL engine to understand them, the namespace prefix declarations need to be lifted into the data model, as RDF triples. Here is how to do this correctly:
<https://example.org/shess>
a owl:Ontology ;
rdfs:label "Chess in SHACL Example Ontology" ;
owl:imports <https://datashapes.org/dash> ;
sh:declare [
a sh:PrefixDeclaration ;
sh:namespace "https://example.org/shess#"^^xsd:anyURI ;
sh:prefix "shess" ;
] .
shess:KingsCannotAttackEachOtherConstraint
a sh:SPARQLConstraint ;
sh:prefixes <https://example.org/shess> ;
sh:select """
SELECT $this ?ws ?bs
WHERE {
$this shess:piece ?whiteKing .
...
In this case, the sh:select query can use the prefix shess: without having to explicitly declare it in the query string. This is because the sh:prefixes point at an RDF resource that holds the sh:declare statement for it.
It is not correct to write sh:prefixes shess: unless the resource shess: carries the sh:declare triple. In the case above it doesn't, because shess: would be <https://example.org/shess#> instead of <https://example.org/shess>
The usual design pattern is to attach the prefix declaration to the resource that represents the graph itself (aka base URI). That resource is often an owl:Ontology that owl:imports other graphs. The SHACL prefix mechanism will walk into these other graphs and collect all prefix declarations from them. For example, when your graph owl:imports <https://datashapes.org/dash> your SHACL-SPARQL queries can use the commonly needed namespaces such as xsd: and rdfs: for free.
Information and Knowledge Architect
1 年I think SHACL is a big step forward from OWL in terms of clarity, conciseness and artefacts to work with knowledge/ontologies. But ... I can't get my head around SPARQL as a step forward in the same direction - it gives an extremely decorated and complex to trace phrases (that's the general opinion when you try to find large scalable graph databases that support SHACL/SPARQL).