Practical aspects of the semantics of SHACL
The past few years, more and more of our customers have started using SHACL. Currently, I am involved in applying SHACL in the context of an ontology that will be used in many different usage scenarios. Therefore, we need to be careful in how we define our resources.
As a result, I decided to delve into some of the subtleties of SHACL I now realize I have never fully grasped. For 99.999% of the work I do with SHACL, this makes no difference at all. Just like you don’t have to understand the details of RDF Semantics to work efficiently and effectively with RDF, it suffices to understand the results of applying SHACL. Still, I am glad that I now have a more detailed understanding.
In this blog post, I share some of these insights. They are not described in much detail in the spec, and work on the SHACL Primer has never been completed. To get a firmer grasp, I had to ask around a bit and dig around in the SHACL Workgroup’s issue tracker.
Implicit class targets
Some people are afraid that the use of implicit class targets can raise confusion. Consider:
ex:Person
? a rdfs:Class, sh:NodeShape ;
? rdfs:label “Person”@en ;
? sh:property [
??? ex:path ex:familyName ;
??? sh:datatype xsd:string ;
??? sh:minCount 1 ;
??? sh:maxCount 1 ;
? ] ;
.
This says that there is a node shape that targets instances of the class Person, and that this node shape requires these instances to have exactly one string value specified for the property family name. An alternative way to say this is to define a separate node shape, say, ex:PersonShape, and use the system predicate sh:targetClass to link the node shape to the class.
Some people fear that this could perhaps inadvertently trick you in believing that node shapes work semantically somewhat like restrictions in OWL. You could, erroneously, suppose that a node shape refers to a class that imposes restrictions on its members. This would be misguided and is not an interpretation sanctioned by the spec — it is an error of interpretation that could arise for any number of reasons, independently of implicit class targets.
The only conclusion that the example licenses is that, as said, there is a node shape that targets instances of the class Person, and that these instances are required to have exactly one string value for the property family name. As per the SHACL specification, the resources in the sh: namespace are system resources with reserved semantics explicitly defined for application by a SHACL processor. I will return to this semantics shortly.
Once you get the wrong interpretation out of your system (if you ever believed something like this in the first place), then there is no reason to get confused anymore. Now, there are people arguing that implicit class targets should still be avoided at all cost. I believe that this avoidance is just as misguided as the confusion between shapes and restrictions. More on this in a moment. Let us first make sure there are no other lingering problems with implicit class targets.
RDFS, OWL, and SHACL
What if you would run an RDFS reasoner or even an OWL reasoner over the example? Absolutely nothing would go wrong. The OWL reasoner will, for instance, trivially infer that ex:Person is a class — which is correct. OWL does not know of the special semantics of SHACL, and because it does not recognize sh:property as a system resource, it infers that ex:Person is simultaneously an individual. The reasoner would assume that these uses of ex:Person refer to different aspects of the same thing, based on the so-called punning mechanism that OWL 2 defines for this purpose — unlike RDF Semantics, OWL DL Semantics assumes that classes and individuals are disjoint. In addition, it would infer that sh:NodeShape, like rdfs:Class, is a class — which, again, is correct. No triples would be generated that do any harm, nor would any contradictions be derived.
SHACL semantics also involves some form of punning — which we may call SHACL-punning as opposed to OWL-punning — because the IRI ex:Person is used to refer to a class, and, simultaneously, to a node shape. Classes and node shapes are very different things. Classes are sets of resources that share a common, repeatable characteristic. Node shapes are descriptions of admissible nodes. Only the SHACL processor, however, needs to be fully aware of this.
Could it perhaps be problematic to “mix open world and closed world semantics,” as some people seem to think? The open world assumption is just an inferencing directive applied by RDFS and OWL reasoners. Again, you can run a SHACL processor and an OWL reasoner on the same graph containing implicit class targets and nothing would go wrong. The specification of SHACL has been carefully designed, so that, as Richard Cyganiac once put it, “…the RDFS-related mechanisms and the SHACL-related mechanisms don’t trip each other up.”
Now, let us consider why we would want to express the example in the given form. As noted, you could also use the alternative with a separate node shape and an explicit class target. The answer to this is that the more compact form involves less resources. The graph structure remains simple. In an ontology with many classes, adding as many node shape URIs complicates the graph.
The answer to this is that the more compact form involves less resources. The graph structure remains simple. In an ontology with many classes, adding as many node shape URIs complicates the graph.
This is illustrated in the following pictures.
Personally, I use explicit and implicit class targets in different situations. As with blank nodes, the compactness of implicit class targets comes with pros and cons. Avoiding a form of expression because some people may, perhaps, not fully understand the spec should, ideally, not be a decisive argument either way. Formally, the spec is crystal clear. Arguing against using implicit class targets is misguided and runs counter to the recommendation.
领英推荐
Annotations on property shapes
Another subtlety of SHACL useful to know about has to do with annotations on property shapes. A real-world example will clarify this. DCAT is a language for talking about catalogues and the items catalogued in them — primarily (but not exclusively) datasets and data services. Its popularity is boosted by the firm position it holds in the FAIR Data community. ADMS, an application profile of DCAT, adds some extra vocabulary to talk more specifically about catalogues of ontologies, thesauri and authority tables (the list of languages, of countries, etcetera).
ADMS defines a useful resource called adms:Identifier. Instances of this class typically have a value for skos:notation (the identifying literal, such as an alphanumeric string, an integer, an IRI, a code) and a value for dcterms:creator (the party that created and assigned the identifier).
Now suppose we need to further constrain the resulting language. Among other things, we require that each instance of adms:Identifier has exactly one value for dcterms:creator, and that this value is taken from a specific authority table. Also, we want the label used in forms to designate the “creator” to be ‘Assigned by’ rather than ‘Creator,’ the label of the property. This original label could be confusing in this context. We could achieve this as follows, where we assign a URI to the property shape, rather than using a blank node as in the previous example.
ex:Identifier-creator
? a sh:PropertyShape ;
? rdfs:label “Identifier – creator shape”@en ;
? sh:name “Assigned by”
? sh:description “The party that created and assigned the identifier”
? ex:path dcterms:creator ;
? sh:class ex:ExampleClassOfAgents ;
? sh:minCount 1 ;
? sh:maxCount 1 ;
.
?The interesting thing here is the use of sh:name and sh:description. You might have expected to see rdfs:label and skos:definition instead. That was the Working Group’s position initially. However, after an issue was raised, the Working Group concluded that separate properties are needed. Conceptually, the label “Assigned by” is the label of the property dcterms:creator in the target where it appears, not of the property shape itself.
If you do not specify sh:name, then a form generator will display the rdfs:label of the property (if present).?However, it does so only in case the property shape’s path is a simple path consisting of a single property. In case of a multi-property path, the form generator will display the value of sh:name or else its behaviour is not specified.
As a side note, some people would prefer to use the namespace adms: in the URI of the shape, instead of ex:. That would be perfectly fine. If you want your URIs to be dereferenceable, however, then you must use your own namespace. You could also encode the original namespace prefixes in the URI if you need to avoid namespace clashes, as in ex:adms_Identifier-dcterms_creator. As long as you remember that URIs have no formal meaning beyond their denotation, all is well.
SHACL semantics
An important thing to keep in mind is that (almost) all SHACL resources are defined in terms of graph nodes and edges, not in terms of their referents (or, as RDF Semantics puts it, their interpretations). For instance, a shape is defined as “an IRI or a blank node s that fulfils at least one of the following conditions in the shapes graph […]”. In looking exclusively at the graph in disregard of any referential semantics, SHACL is deliberately very much like SPARQL.?“Normal” RDF semantics does not provide a way to directly talk about the nodes in a graph. It allows you to talk about the class of persons, but not about the node ex:Person.
This is the key to understanding how SHACL semantics works. Consider the following two statements about Paris: (1) Paris is the capital of France; (2) Paris is a noun and starts with a capital ‘P’. In statement (1), we take ‘Paris’ to refer to the city — this is called the de re reading (from re ‘thing’). In statement (2), we take ‘Paris’ to refer to itself — this is called the de se reading (from se ‘itself’). SHACL semantics takes graphs (including itself) in a de se reading.
Notwithstanding the fact that SHACL resources are formally defined as graph nodes, SHACL needs to have a RDF Semantics interpretation. Therefore, a shape like ex:Identifier-creator is not only an IRI — it must also refer to at least something. The spec remains vague about these denotations. This is by design. Do not make wrong assumptions about what shapes mean. When working with SHACL, it is important to keep your eyes on the ball — which is the graph, not its interpretation.
The case for implicit class targets
As noted, using implicit class targets avoids complicating the graph with redundant URIs and unnecessary complexity. But there is more.
Because SHACL talks about graph nodes instead of their interpretations, one could say that shapes define syntactic constraints — just as you can say that RDFS and OWL expressions define semantic constraints. My previous SHACL blog also touched on this theme.
Suppose you buy a dictionary in two parts. Part one contains only semantic definitions, part two only syntactic definitions. So, in part one you would find that ‘melt’ (as in “the sun melted the ice”) means “to cause to become fluid,” and in part two, that the same word is a transitive verb with regular inflections for past and perfect tense.
There are absolutely cases where it is a good idea to separate out the descriptions of the two aspects and distribute them separately. However, in many other cases, it is a better idea to keep things simple, at least initially. You could create three graphs, one with RDFS declarations, one with OWL expressions and one with SHACL shapes, but unless there are clear and quantifiable benefits to be achieved, this may be a waste of time and money — not only in producing and maintaining, but also in using the graphs.
Normally, I start out with a simple ontology containing RDFS class and property descriptions with implicit class targets and property shapes with IRIs (not blank nodes) — just enough of everything. This keeps things maximally simple while enabling users to simply enrich or override property shapes and add their own node shapes, using any mechanism they deem useful, including implicit class targets. And, of course, they can mix and match all of this with their own ontologies and data sets.
Finally, some more words on the open world assumption. It is sometimes overlooked that it does not meaningfully apply in the context of SHACL. SHACL does not “close the world” — it just applies syntactic constraints to graphs.?Open and closed world assumptions are relevant to semantics only. The same dichotomy occurs in natural languages. A verb like ‘reading’ as in “Alice is reading” can be said to be semantically transitive: you can infer that there must be something that Alice is reading. In contrast, the verb ‘melted’ is syntactically transitive. As a result, the sentence “The sun melted,” in the same intended sense as above (‘to cause to become fluid’), is ungrammatical. Therefore, in this case, you cannot infer that there must be something that the sun melted — be it the ice or the snow or the butter. You cannot infer anything at all from an ungrammatical sentence. Both syntactic and semantic constraints are admissible and useful in ontologies, in models of the world and in models of a language.
It remains remarkable that people can get worked up about perceived controversies concerning syntax and semantics. In American linguistics, fierce debates have been waged about this. These Linguistic Wars are long over. If anything ?useful came out of that debate at all, it is the realisation that syntax and semantics work together in a language. They do not compete: they cooperate. In the practical application of RDF, the advent of SHACL is a godsend.
Acknowledgements
Thanks to David Price, Irene Polikoff and Holger Knublauch for discussions on some of these topics. Any remaining errors are my own.
Betrokken community manager met een passie voor het optimaal online toegankelijk maken van collecties met betrekking tot Tweede wereldoorlog van archieven, musea, bibliotheken en andere erfgoed instellingen
2 年Mark Lindeman
Sr Enterprise Architecture Analyst/Modeler, Ontologist w/Active Secret clearance | Published Speaker | Mathematician, Logician | Recycled "that box" a long time ago
2 年Would you please provide visual knowledge graph examples illustrating alternative approaches, yours, and enumerate reasons for its superiority? I can follow explanation accompanying illustration better than one without.
serendipity expert
2 年NIce article! I think it introduces very well the idea that SHACL can be seen having a more "industrial" approach than other approaches.
Bespoke Generative AI for Engineering & Manufacturing (PLM, MES, ERP) | Cloud Native | Air Gapped | System Integration | Concepts, Technologies, Execution
2 年John Howard