At Last, SHACL made CR-Status!
Last week was a memorable week. Had I not had this unsolicited encounter with a scooter whilst running, I would not have been visited by broken bones and bruises, and would have written this story full of hope and promise a little earlier. (Ouch! Still mostly immobilized, thank you, getting better though…) What was really memorable was that on Tuesday, April 11, 2017, a certain W3C-standard in the making, called SHACL, has officially progressed from Working Draft to Candidate Recommendation (CR). This is great news for the people working on it, since they have been doing that for years now, and quite intensively so.
The W3C procedures are very formal and strict. If after a certain predefined period, CR-status is not secured, the process comes to a halt and the planned standard is not to be. In the person of Pano Maria, Taxonic takes part in the working group and I can assure you that the atmosphere in it has been tense for a while, with everyone frantically focussing on the all-important goal of reaching CR-status in time. So that we are in a position to turn SHACL into a real W3C-standard later this year. Thanks to many people’s hard work, especially the two intrepid editors Holger Knublauch and Dimitris Kontokosta’s, that intermediate goal has now been reached!
It is also great news for the rest of us, because SHACL is a significant step towards making Linked Data more viable and useable in practical situations. Many people believe that once it is an official standard, SHACL will be a game changer in data governance and Big Data. It will enable a new level of growth in the uptake of Linked Data.
Its name is an acronym of Shape Constraint Language, and as it suggests, it is a language in which constraints on data sets (called graphs) can be expressed. For instance, the constraint that each subject in the database shall have one and only one Social Security Number. Constraints on data are highly important for consistency, quality control, input validation, security, privacy, and, in general, for making sense. Therefore, relational databases have Data Description Language. Likewise, XML has XSD, and UML has Object Constraint Language to express such constraints. The fact that such a language has been absent in Linked Data is a bit strange and certainly a serious barrier to its uptake.
Instead of defining a language for expressing constraints, Linked Data started off with a much more ambitious endeavour: a language to express formal statements in predicate logic which capture logical relations between classes and individuals. This language is OWL, an acronym of Web Ontology Language. (So, not WOL? It’s funny. Laugh.) It has a formal, mathematically rigorous semantics associated with it. This is good news, since all sorts of AI-applications come thus in reach. It is also bad news, because those of us lacking a solid background in mathematical logic are bound to make unwarranted assumptions and errors. They have a hard time using OWL.
As an example, let us say we want to know whether a certain party has or has not yet transferred money to our bank account. We use a banking app to look into the database of our bank. If we do not find a statement confirming the payment, we conclude the payment has not been made. With OWL, that conclusion is not possible. OWL is based on the Open World Assumption: if a proposition is not asserted, it can still be true. Maybe, somewhere out there, on the Web, waiting to be found, there is a statement that does confirm the payment! The Open World Assumption is not the outcome of an open-minded and liberal, albeit impractical world view or anything of the kind. It is a mathematical definition of a class of inference regimes, that’s all. It works just great for a certain class of AI-applications, but in many practical situations where we use OWL, reaching an intended result becomes convoluted or outright impossible.
In those situations, instead of a mathematically defined inference regime, we need a language in which we can capture constraints simply and effectively, in a way that speaks to our intuitions. That language is SHACL. A brief introduction into how SHACL works can be found at the TopQuadrant website.
At the strategic level, the relevance of SHACL is that it will, a last, fill in a glaringly empty slot in the Linked Data stack of standards and technologies. In doing so, it also redefines OWL’s position: a point that has recently been made in a great blog by Kurt Cagle as well. Instead of being the centrepiece of Linked Data, OWL will increasingly be seen as a tool that is useful for some tasks and not for others — and you don’t have to use it. SHACL, on the other hand, will function as an easy-to-use, general-purpose tool for defining constraints on data, to support data exchange, quality assurance, consistency, input validation, and, even, certain types of inferencing.
To see this, it is necessary to briefly talk about the query language for Linked Data: SPARQL, which is an acronym of SPARQL Query Language. (Another nerdy joke. Laugh.) Inference engines in the Linked Data world increasingly run on SPARQL rather than on native OWL. Based on the semantics underpinnings of OWL, one can make inferences about identity, class membership and property values, and automatically enrich a dataset with extra, “inferred” data. Vendors nowadays use so-called CONSTRUCT-statements in SPARQL to implement such inferences. The advent of SHACL will give a boost to solutions for inferencing, because it gives a fine level of control over which types of inferences are desired and which are not.
Defenders of the old faith will no doubt have mixed feelings about this. It is a step away from mathematical rigour, and a step in the direction of accommodating normal needs that users of data have when confronted with our — sometimes messy — reality.
This shift has proven a difficult one to go through. It is reflected in how SHACL came to be: a long process, with many ups and downs and frustrations. Compounding matters is the fact that SHACL is difficult to design: there are so many quandaries, both at the fundamental level and in the detailing of how things work. It does many things at once: define how constraints are expressed, how they are interpreted by a SHACL-engine whilst deciding whether constraints are violated in a graph, how the SHACL-engine ought to report violations, how to deal with recursive constraints, and many, many more things. At the same time while dealing with such highly theoretical topics, the group had to ensure the end user experience that SHACL offers is relatively simple and intuitive. It is certainly a feat that inspires high esteem for the work having been done that we now have CR-status!
SHACL having made CR means that W3C looks favourably on the results so far, and considers it a solid foundation for the next phase, which will include more reviews and, most of all, implementations. To reach CR-status, a number of implementations have already to be in place to prove that the proposed standard works in practice — and they are. I heard rumours to the effect that Tim Berners Lee himself built a SHACL engine to get a feel for how good it really is. The rumour does not detail which language he used, I am still wondering. And when done, TBL was so satisfied he personally made a point of awarding the long sought-after CR distinction. Of course, whether there is any truth at all in such rumours, I am in no position to judge. But it is a good story. And the award is a great step forward!
Helping people to succeed using AI wisely
7 年Thank you for this article, great news
Data Architect
7 年SHACL marks a milestone in the progress of Linked Data. Nice piece of work!
Technical Evangelist at OpenLink Software and Chief Curmudgeon at MacTed Unlimited
7 年Thank you for this article. TimBL did indeed build a rudimentary SHACL processor, mostly if not entirely over a weekend, in JavaScript. You can see, use, and/or contribute to it here -- https://github.com/linkeddata/shacl-check