登录查看更多内容

At Last, SHACL made CR-Status!

Jan Voskuil

CEO Taxonic & Ontologist

发布日期: 2017年4月18日

Last week was a memorable week. Had I not had this unsolicited encounter with a scooter whilst running, I would not have been visited by broken bones and bruises, and would have written this story full of hope and promise a little earlier. (Ouch! Still mostly immobilized, thank you, getting better though…) What was really memorable was that on Tuesday, April 11, 2017, a certain W3C-standard in the making, called SHACL, has officially progressed from Working Draft to Candidate Recommendation (CR). This is great news for the people working on it, since they have been doing that for years now, and quite intensively so.

The W3C procedures are very formal and strict. If after a certain predefined period, CR-status is not secured, the process comes to a halt and the planned standard is not to be. In the person of Pano Maria, Taxonic takes part in the working group and I can assure you that the atmosphere in it has been tense for a while, with everyone frantically focussing on the all-important goal of reaching CR-status in time. So that we are in a position to turn SHACL into a real W3C-standard later this year. Thanks to many people’s hard work, especially the two intrepid editors Holger Knublauch and Dimitris Kontokosta’s, that intermediate goal has now been reached!

It is also great news for the rest of us, because SHACL is a significant step towards making Linked Data more viable and useable in practical situations. Many people believe that once it is an official standard, SHACL will be a game changer in data governance and Big Data. It will enable a new level of growth in the uptake of Linked Data.

Its name is an acronym of Shape Constraint Language, and as it suggests, it is a language in which constraints on data sets (called graphs) can be expressed. For instance, the constraint that each subject in the database shall have one and only one Social Security Number. Constraints on data are highly important for consistency, quality control, input validation, security, privacy, and, in general, for making sense. Therefore, relational databases have Data Description Language. Likewise, XML has XSD, and UML has Object Constraint Language to express such constraints. The fact that such a language has been absent in Linked Data is a bit strange and certainly a serious barrier to its uptake.

Instead of defining a language for expressing constraints, Linked Data started off with a much more ambitious endeavour: a language to express formal statements in predicate logic which capture logical relations between classes and individuals. This language is OWL, an acronym of Web Ontology Language. (So, not WOL? It’s funny. Laugh.) It has a formal, mathematically rigorous semantics associated with it. This is good news, since all sorts of AI-applications come thus in reach. It is also bad news, because those of us lacking a solid background in mathematical logic are bound to make unwarranted assumptions and errors. They have a hard time using OWL.

As an example, let us say we want to know whether a certain party has or has not yet transferred money to our bank account. We use a banking app to look into the database of our bank. If we do not find a statement confirming the payment, we conclude the payment has not been made. With OWL, that conclusion is not possible. OWL is based on the Open World Assumption: if a proposition is not asserted, it can still be true. Maybe, somewhere out there, on the Web, waiting to be found, there is a statement that does confirm the payment! The Open World Assumption is not the outcome of an open-minded and liberal, albeit impractical world view or anything of the kind. It is a mathematical definition of a class of inference regimes, that’s all. It works just great for a certain class of AI-applications, but in many practical situations where we use OWL, reaching an intended result becomes convoluted or outright impossible.

In those situations, instead of a mathematically defined inference regime, we need a language in which we can capture constraints simply and effectively, in a way that speaks to our intuitions. That language is SHACL. A brief introduction into how SHACL works can be found at the TopQuadrant website.

At the strategic level, the relevance of SHACL is that it will, a last, fill in a glaringly empty slot in the Linked Data stack of standards and technologies. In doing so, it also redefines OWL’s position: a point that has recently been made in a great blog by Kurt Cagle as well. Instead of being the centrepiece of Linked Data, OWL will increasingly be seen as a tool that is useful for some tasks and not for others — and you don’t have to use it. SHACL, on the other hand, will function as an easy-to-use, general-purpose tool for defining constraints on data, to support data exchange, quality assurance, consistency, input validation, and, even, certain types of inferencing.

To see this, it is necessary to briefly talk about the query language for Linked Data: SPARQL, which is an acronym of SPARQL Query Language. (Another nerdy joke. Laugh.) Inference engines in the Linked Data world increasingly run on SPARQL rather than on native OWL. Based on the semantics underpinnings of OWL, one can make inferences about identity, class membership and property values, and automatically enrich a dataset with extra, “inferred” data. Vendors nowadays use so-called CONSTRUCT-statements in SPARQL to implement such inferences. The advent of SHACL will give a boost to solutions for inferencing, because it gives a fine level of control over which types of inferences are desired and which are not.

Defenders of the old faith will no doubt have mixed feelings about this. It is a step away from mathematical rigour, and a step in the direction of accommodating normal needs that users of data have when confronted with our — sometimes messy — reality.

This shift has proven a difficult one to go through. It is reflected in how SHACL came to be: a long process, with many ups and downs and frustrations. Compounding matters is the fact that SHACL is difficult to design: there are so many quandaries, both at the fundamental level and in the detailing of how things work. It does many things at once: define how constraints are expressed, how they are interpreted by a SHACL-engine whilst deciding whether constraints are violated in a graph, how the SHACL-engine ought to report violations, how to deal with recursive constraints, and many, many more things. At the same time while dealing with such highly theoretical topics, the group had to ensure the end user experience that SHACL offers is relatively simple and intuitive. It is certainly a feat that inspires high esteem for the work having been done that we now have CR-status!

SHACL having made CR means that W3C looks favourably on the results so far, and considers it a solid foundation for the next phase, which will include more reviews and, most of all, implementations. To reach CR-status, a number of implementations have already to be in place to prove that the proposed standard works in practice — and they are. I heard rumours to the effect that Tim Berners Lee himself built a SHACL engine to get a feel for how good it really is. The rumour does not detail which language he used, I am still wondering. And when done, TBL was so satisfied he personally made a point of awarding the long sought-after CR distinction. Of course, whether there is any truth at all in such rumours, I am in no position to judge. But it is a good story. And the award is a great step forward!

Jordi Mulet Albiach

Helping people to succeed using AI wisely

7 年

Thank you for this article, great news

Rob van Dort

Data Architect

7 年

SHACL marks a milestone in the progress of Linked Data. Nice piece of work!

Ted Thibodeau Jr

Technical Evangelist at OpenLink Software and Chief Curmudgeon at MacTed Unlimited

7 年

Thank you for this article. TimBL did indeed build a rudimentary SHACL processor, mostly if not entirely over a weekend, in JavaScript. You can see, use, and/or contribute to it here -- https://github.com/linkeddata/shacl-check

1 次回应

查看更多评论

要查看或添加评论，请登录

Jan Voskuil的更多文章

FOIS 2024: Where AI succeeds with help from knowledge graphs and fails without

2024年8月7日

FOIS 2024: Where AI succeeds with help from knowledge graphs and fails without

#fois #ontology #knowledgegraph #ai #ml #neurosymbolicai Combining knowledge graphs and machine learning delivers…

11 条评论
The value of conceptual modeling

2023年11月20日

The value of conceptual modeling

On 9 November, I had the honor to participate in the 90-minute Industry Panel at the ER 2023 conference in Lisbon…

17 条评论
From the frontlines in Leipzig: LLMs and Knowledge Graphs

2023年10月4日

From the frontlines in Leipzig: LLMs and Knowledge Graphs

This year’s edition of SEMANTiCS, in Leipzig, was wonderful: great venue, a big turn-out and a tsunami of bright ideas…

37 条评论
History in the TOOI knowledge graph

2022年9月29日

History in the TOOI knowledge graph

In a large-scale knowledge graph used for describing government information (among other things), history plays an…

3 条评论
Modelling MARC relators using PROV or how to fix DCAT 2

2022年6月13日

Modelling MARC relators using PROV or how to fix DCAT 2

Among the finer details of DCAT 2, the W3C standard for describing catalogs of datasets, lurks a subtle mistake. It…

4 条评论
Practical aspects of the semantics of SHACL

2022年5月2日

Practical aspects of the semantics of SHACL

The past few years, more and more of our customers have started using SHACL. Currently, I am involved in applying SHACL…

6 条评论
SYNTAX, SEMANTICS, AND THE GREAT OWL HOAX

2022年1月24日

SYNTAX, SEMANTICS, AND THE GREAT OWL HOAX

Older ontologies use OWL to express constraints. SHACL is the 2017 W3C recommendation that was specifically designed…

54 条评论
Parts and Wholes in Object Type Libraries — OTL Best Practices Part 2

2021年7月26日

Parts and Wholes in Object Type Libraries — OTL Best Practices Part 2

Second in a series on emerging best practices in the design of Object Type Libraries or OTLs, this article examines the…

3 条评论
The Case for RDF (Revisited)

2021年6月14日

The Case for RDF (Revisited)

Over the past few years, growth in the uptake of RDF has picked up steadily. In some domains, such as asset management…

4 条评论
Real-World BIM: Analysing Different Modelling Options in Published Object Type Libraries

2021年5月25日

Real-World BIM: Analysing Different Modelling Options in Published Object Type Libraries

In the world of asset management, data exchange is a big thing. Modelling the underlying information is a central tenet…

17 条评论

See all articles

At Last, SHACL made CR-Status!

Jan Voskuil

CEO Taxonic & Ontologist

Jan Voskuil的更多文章

社区洞察

其他会员也浏览了

Should we all be talking the same language, or learn how to speak many??

On ASPM and octopuses

Understanding TRACEROUTE

EntityFrameworkCore exception related to fetching navigational or related properties in a recursive loop Fixes

The 'I' of 'ACID' guarantees provided by Delta lake protocol

Kafka Consumer Retry

Http Status codes

Best solutions for Codility Lessons. Lesson 5 Prefix Sums

Brim - A Network Forensic Tool

Jan Voskuil的更多文章

FOIS 2024: Where AI succeeds with help from knowledge graphs and fails without

The value of conceptual modeling

From the frontlines in Leipzig: LLMs and Knowledge Graphs

History in the TOOI knowledge graph

Modelling MARC relators using PROV or how to fix DCAT 2

Practical aspects of the semantics of SHACL

SYNTAX, SEMANTICS, AND THE GREAT OWL HOAX

Parts and Wholes in Object Type Libraries — OTL Best Practices Part 2

The Case for RDF (Revisited)

Real-World BIM: Analysing Different Modelling Options in Published Object Type Libraries

社区洞察

其他会员也浏览了

Should we all be talking the same language, or learn how to speak many??

On ASPM and octopuses

Understanding TRACEROUTE

EntityFrameworkCore exception related to fetching navigational or related properties in a recursive loop Fixes

The 'I' of 'ACID' guarantees provided by Delta lake protocol

Kafka Consumer Retry

Http Status codes

Best solutions for Codility Lessons. Lesson 5 Prefix Sums

Brim - A Network Forensic Tool