Inferencing with SHACL using sh:values
SHACL is best known as a language for representing constraints on the shape of RDF graphs. But the W3C WG also produced a companion document called the SHACL Advanced Features that is now maintained by the SHACL Community Group. This part of the language includes features to represent inference rules that can be used to derive new statements from existing (asserted) statements.
In my experience these inferencing features are extremely useful for real-world application scenarios. We and our customers have had SHACL inference rules in production for many years and their design is stable. Let me explain how they work and why they are important.
Example: Counting Taxonomy Concepts
As a toy example, let's look at a SKOS Concept Scheme. The scheme instance is linked to 11 top concepts via its property skos:hasTopConcept. As shown below, there is an inferred property that computes this count automatically and on-the-fly so that it can be displayed to the user. Note there is no "top concept count" triple stored in the graph:
The example Turtle code below shows how this works. It declares a SHACL property shape for the class skos:ConceptScheme which states that the values of the property ex:topConceptCount shall be computed by counting the values of the property skos:hasTopConcept:
skos:ConceptScheme
a owl:Class, sh:NodeShape ;
sh:property skos:ConceptScheme-topConceptCount ;
...
skos:ConceptScheme-topConceptCount
a sh:PropertyShape ;
sh:path ex:topConceptCount ;
sh:datatype xsd:integer ;
sh:description "The number of top concepts in this scheme." ;
sh:maxCount 1 ;
sh:name "top concept count" ;
sh:values [
sh:count [
sh:path skos:hasTopConcept ;
] ;
] .
The property sh:values is part of the SHACL Advanced Features and instructs a capable SHACL processor on how the values of this property shall be computed. In this case a little structure of RDF blank nodes with properties such as sh:count and sh:path is used, but later we will show more complex examples.
If you struggle with the syntax, TopBraid includes a little wizard for common design patterns:
General Syntax of sh:values Rules
The sh:values property links a Property Shape with a so-called Node Expression. SHACL defines a number of different Node Expression types. Most of them take a sequence of RDF nodes as input and produce another sequence of RDF nodes as output. This means that node expressions can be chained together. For example, the sh:count expression from the example above takes the sh:path expression as input and produces a sequence consisting of just a single integer node as output.
Check the SHACL-AF spec for a list of other Node Expression types. The most basic expression type is Constants, represented simply as URIs or RDF literals. On the more complex end of the spectrum, Node Expressions may perform arbitrary SPARQL SELECT queries to infer new values. And platforms like TopBraid even include an option to use JavaScript as an inference language.
In contrast to OWL inferencing, which was intentionally limited to a formally tractable subset of logic, SHACL rules have almost unlimited expressiveness. What is the point of a nice theoretical foundation in description logic if you cannot even concatenate strings or perform basic maths with OWL!
Example: If-Then-Else Rules
In this example below, the sh:if node expression points at an sh:exists expression that tests whether the current focus node is the capital of some country. If true, it produces "blue", otherwise it produces "red".
g:City-fillColor
a sh:PropertyShape ;
sh:path tbgeo:fillColor ;
sh:datatype xsd:string ;
sh:name "fill color" ;
sh:values [
sh:if [
sh:exists [
sh:path [
sh:inversePath g:capital ;
] ;
] ;
] ;
sh:then "blue" ;
sh:else "red" ;
] .
Such complex node expressions can be visualized as diagrams, illustrating that data "flows" from left to right, with each Node Expression taking one or more input values and producing zero or more output values.
Node Expressions can be linked together like Lego bricks. For example, the input to the sh:then above may be yet another complex node expression instead of the constant "blue".
Querying Inferred Values
Implementations have some flexibility on what to do with sh:values rules. In our product, TopBraid, sh:values rules are applied as part of all GraphQL queries.
The example GraphQL query above returns JSON with all instances of the class City, and for each City it returns a label and the fillColor, which is computed at query time using the if-then-else rule further above. The GraphQL user does not even need to know that fillColor is a completely virtual field that has no RDF triples asserted in the graph.
This works nicely and efficiently from GraphQL because the GraphQL engine knows in advance that the fillColor field is backed by an inference rule. It knows this because the surrounding query context is the City class, and the SHACL for City includes the sh:values rule.
Likewise, our Active Data Shapes (ADS) JavaScript framework computes inferred values whenever they are needed. Again, this is possible because the JS engine has enough context from the surrounding object to determine in advance if a property is inferred or not. So there is little performance overhead.
Processing sh:values rules is more difficult for a SPARQL engine, where typically no such context exists and all you have are BGP triples. and querying inferred values in the "inverse" direction is hard without materializing the triples first and that option is difficult if data changes often. In TopBraid we have therefore elected to not compute the inferences at query time, but have added a special "magic" property function to request them explicitly. This may, however, change in the future and other SPARQL engines may compute such values on the fly too.
Note that sh:path expressions inside of SHACL Node Expressions are designed to apply nested inferences on demand. So when a sh:values rule depends on a sh:path expression which is backed by another sh:values rule then those rules are computed when needed. This is similar to backward chaining.
Example: Complex Inferences and SPARQL
In this example, the SKOS taxonomy was enriched with an inferred property that computes the total number of narrower concepts (children) of a given Concept:
The sh:values rule for this property computes the sh:count of a complex SHACL sh:path expression that walks into the concept hierarchy using sh:oneOrMorePath of the inverse of skos:broader:
skos:Concept-totalNarrowerCount
a sh:PropertyShape ;
sh:path g:totalNarrowerCount ;
sh:datatype xsd:integer ;
sh:name "total narrower count" ;
sh:values [
sh:count [
sh:path [
sh:oneOrMorePath [
sh:inversePath skos:broader ;
] ;
] ;
] ;
] .
Here is the same example using a SPARQL SELECT Expression:
sh:values [
sh:prefixes <https://topbraid.org/skos.shapes> ;
sh:select """
SELECT (COUNT(?narrower) AS ?count)
WHERE {
?narrower skos:broader+ $this .
} """ ;
] .
SPARQL can be regarded as the ultimate fallback that gives a lot of expressiveness for things that cannot be covered using other Node Expressions. Almost all SHACL Node Expression types have a direct translation into SPARQL, but there is not (yet) a concept of variables that could be used for joins.
Example: Filtering by Shapes
Here we define a property that only contains the preferred label(s) that have a German language tag:
skos:Concept-germanLabel
a sh:PropertyShape ;
sh:path ex:germanLabel ;
sh:name "German label" ;
sh:values [
sh:nodes [
sh:path skos:prefLabel ;
] ;
sh:filterShape [
sh:languageIn ( "de" ) ;
] ;
] .
The sh:values rule above uses sh:filterShape which takes the values of the path skos:prefLabel as its input and only keeps those that conform to the given SHACL shape. Here, each preferred label is checked whether it has a language as defined by sh:languageIn. You may also use any other shape here, implementing complex filter conditions with SHACL features like sh:minCount, sh:node and sh:hasValue.
Example: Rules using ADS JavaScript
Within TopBraid, rules may be backed by arbitrary ADS JavaScript snippets. Here is an example from the Software Knowledge Graph from my previous article. This property rule performs a regular expression search over documentation markup to extract links to components that are mentioned in the markup:
We have several other examples where we use JavaScript to infer rdf:HTML literals to produce custom renderings of values on forms. And yes, such JavaScript rules can also make web service calls, for example to query an LLM when needed...
While these capabilities go way beyond the W3C specification at this stage, they illustrate our commitment to delivering practical solutions to real-world problems that are typically found in enterprise settings.
Where to Go From Here
If you want to play with sh:values rules, you can use the open source TopBraid SHACL API. Enterprise users can find comprehensive inference support in the TopBraid EDG product line. I cannot say which other vendors are supporting them at this time.
Now that SHACL has been a well-established W3C standard since 2017, it is quite possible that official efforts towards a next generation of SHACL are relaunched. The next version of SHACL may be the result of another full-blown formal W3C Working Group process, but SHACL could also become a "Living Standard" where features are added incrementally once enough implementations exist. I would very much hope that the SHACL specifications get restructured and widened in scope to include SHACL Inferencing as a dedicated document. SHACL (Core) already defines the framework for defining shapes, targets, property definitions, so adding just one more property for inferencing is a natural and sensible extension. How do other vendors and users feel about this?
Knowledge Graph Engineer / Solutions Engineer at TopQuadrant
1 年Nice article Holger. I always liked using the sh:values>sh:select pattern because I could query the full SPARQL query out of the graph anytime I wanted.
Web de Données · Knowledge Graphs · Ontologies · sparna.fr
1 年Thanks Holger. Let’s keep SHACL for what it is : shapes to specify the structure of a graph. While making inférence like this is undoubtedly useful, I think mixing it in the same vocabulary/standard than shapes is confusing for users.
Boosting quality of data in healthcare and life sciences.
1 年Always glad to see solutions like this one, driven by real-world application scenarios instead of theoretical purism. Thanks for sharing it with such a level of details!
-
1 年Hi Holger, nice piece! About your last paragraph - what do you mean with "adding one more property"? Would it be a matter of putting the logic of inferencing in some special propertyshapes, and then being able to run the same command to do both validation and inferencing? Curious how the problem of sequence would be solved in that case (do you inference first, and then validate your data or vice versa? )
free your time to create.
1 年Thanks for the summary. We also make use of this nice feature ??