The value of conceptual modeling

The value of conceptual modeling

On 9 November, I had the honor to participate in the 90-minute Industry Panel at the ER 2023 conference in Lisbon, together with Magda Cocco (head of ICT practice at Vieira de Almeida, a Portuguese law firm), Walid Saba (Senior Research Scientist at the Institute for Experiential AI at Northeastern University), and Mike Bennett (technical director at OMG and originator of the FIBO ontology, the widely used standards repository for financial industry concepts and definitions).

It was great to confront so many wonderful ideas from different perspectives with each other. In this blog post, I summarize some of the major points and add a bit of extra perspective. First, however, a big thank you to Giancarlo Guizzardi and Jose Borbinha for organizing this edition of the ER-conference. It was the first time I attended this conference, but certainly not the last.

Main themes of the discussion

The main question we tried to answer is what exactly is the value that modeling brings to the industry? What are the problems and what are effective tools and techniques? Our respective main themes were as follows.

My perspective

In my view, until recently, the value of models has mostly been a secret hidden in the IT-departments of organizations. Models are traditionally created by developers to help them structure their code and data repositories. Nowadays, the business column in many organizations needs to answer new tasks, which often involve the delivery of information products. The business needs its own knowledge engineers to create the models that define the structure of the datasets expressing these information products. The value of these models is that they provide customers and users with a recipe to understand and use these information products, using their own tools and algorithms.

An example of this in my own day job as an ontologist is in the cultural heritage sector. Curators increasingly have the ambition to make datasets available that seamlessly integrate with datasets from other cultural heritage organizations, so that third parties can find high-value information on, say, a painting of Van Gogh, including data from various sources. Using widely known standards such as LinkedArt and the AAT thesauri, this is now an achievable goal [1]. To create such models, I use methods from the knowledge engineering and ontology domain — my favorite one being UFO.

?Magda's perspective

Magda stressed the importance of conceptual models as a way to bridge the gap in understanding between business and technical people in cases where increasingly complex regulatory requirements are to be met. As an example, from a legal point of view, how do you ascertain that an organization is GDPR-compliant? A negative answer to a simple question like “does your database contain personal data” is no guarantee for compliance. Technical people are often unaware that, for example, IP-addresses and MAC-numbers are privacy-sensitive. Therefore, you need to break down such questions about GDPR-compliance into much more concrete ones. A well-structured conceptual model is a great help in such situations.

Walid's perspective

Walid made the point that conceptual modeling is a knowledge engineering challenge. As experience over the past decades has shown, it will not work when knowledge engineers try to reach perfection without bothering about practical value. This value often lies in the ease with which software and data engineers can use the models in their day-to-day work. You have to constantly evaluate the practical consequences of design choices in the model. The best way to evaluate them is to see how they play out in real-world datasets expressed in terms of the model, and in the logic of programs that process these datasets.

Mike's perspective

Mike talked about his work on FIBO [2]. He forcefully argued that modeling is often done by technical people instead of knowledge engineers. The results are not good. Classifications mix different, often incompatible dimensions. Code lists are incomplete or even unsound. Also, too much focus on the technical aspects of RDFS and OWL leads to overly abstract models. But we do not need abstract models of the solution. What we need is concrete models of the problem.

Modelers are too often satisfied when business experts agree on vague definitions of the same terms. This often hides the reality that people from different corners of the business use the same terms in different ways, with different meanings. Solid knowledge engineering techniques are an absolute requirement to get useful results. Mike’s involvement in the creating of FIBO has made him acutely aware of these pitfalls. As a knowledge engineer, he brought to the table not only in-depth knowledge of conceptual modeling, but also a fair amount of domain knowledge. He was able, therefore, to make sure? that definitions of terms and classification hierarchies were clearcut and unambiguous, and, most importantly, make sure business experts were fully aware of what they were signing off.

Questions and answers

Part of the discussion revolved around how to reconcile the insight that ontologies should be application-agnostic with the realization that an ontology should answer practical requirements and support working software? After fielding some incisive questions from the audience, the panel settled on the idea that creating an ontology should be based on a purpose. This is more abstract than concrete use cases in a given IT-system, but offers enough guidance to avoid getting unhinged. Also, ontologies should be developed iteratively in small steps, making sure that each next increment is tested in real-world datasets and software that processes these datasets.

An example from my own experience is the TOOI-knowledge graph, developed by KOOP, the Dutch government’s publications office [3]. It contains an ontology, various thesauri (taxonomies), and reference data sets (including history). Its purpose is to make government information findable, accessible, interoperable and reusable (FAIR). If one spreadsheet contains data about COVID in municipalities, and another contains data about the number of inhabitants in municipalities, and both are published by government agencies, the use of TOOI ensures the datasets can be seamlessly combined. The same knowledge graph (ontology plus instance data) can be used to describe publications like rules, regulations and policies in terms of rich metadata in a plethora of publication pipeline systems. There is no single delineated set of use cases underpinning TOOI. Yet, it does have a clearly delineated purpose, and it is constantly tested during its iterative (and still ongoing) development.

More on trends and developments

During the panel discussion, I talked about one development that makes the value of modeling more apparent outside of the IT-department. There are more of these developments that now converge, so that these are interesting times for knowledge engineers, ontologists, and conceptual modelers.

  • Data as a product. Within organizations, the business column is more and more required to deliver large-scale, highly structured information products. Making the inherent semantic structures explicit, i.e., creating explicit renderings of the underlying conceptual models, is not something that can be delegated to the IT-department. Business needs to take the lead and be in control. To that end, they need to hire knowledge engineers, ontologists, data stewards and even data engineers. When done properly, this shift in governance enables the business to deliver high-value information to customers, users, regulatory bodies and other parties.
  • The FAIR Data movement. This movement starts from the realization that in order to make science sustainable, we need a new approach to sharing datasets. Data should be findable, accessible, interoperable, and reusable (FAIR). Creating and using solid, shared conceptual models is a centerpiece of the proposed methodology.
  • The datacentric manifesto. This movement starts from the realization that current practices inside enterprises to keep track of information dispersed in hundreds of application-specific databases just does not scale anymore. Applications should read and write in a common shared data repository. Again, using solid, shared conceptual models is a centerpiece of the proposed methodology. [5]
  • The search for Explainable AI. There is growing consensus that knowledge graphs, including the conceptual models contained in them, can be leveraged to augment neural networks and LLMs. This would make AIs smarter --- perhaps even adding reasoning power --- and lead to Explainable AI. In a previous article, I summarized the state of the art as presented at the recent SEMANTiCS conference. It is too early to know exactly which direction this development will take, but the central idea in this context is providing semantics through conceptual models.


?

Thus, these four developments, though unrelated in starting point as well as in the destination envisaged, all share the same central idea. Conceptual modeling is bound to develop beyond the limits of the IT-department and grow into a discipline in its own right, adding value to business activities directly across the industry.

?Why is modeling so difficult?

If anything, the panel discussion revealed that creating a sound conceptual model is not easy, and that it requires a thorough methodological approach. A software development methodology will not do. My favorite methodology is Unified Foundational Ontology (UFO), which has been around for two decades and is now widely used. To answer the growing demand for ontologists, we, as a community, need to educate people — not only at the university, but also people already working in the industry. At Taxonic, we try to contribute to this goal by making available an e-learning course on UFO and OntoUML.

The question remains, though: why exactly is modeling so difficult? To answer this question, we need to dig deeper in how we use language. This leads to themes that Walid discussed in his wonderful and inspiring keynote speech, about language understanding and LLMs and other AIs. In the next installment of this blog, I will investigate this further. Stay tuned!

Notes

[1] See, for instance, https://vangoghworldwide.org/, which consolidates information from a large number of sources.

[2] For more on FIBO, see https://spec.edmcouncil.org/fibo/

[3] For TOOI documentation and resources, all in Dutch, see https://standaarden.overheid.nl/tooi

[4] See https://www.go-fair.org/

[5] See https://datacentricmanifesto.org/ and, for instance, D. McComb, The Data-Centric Revolution: Restoring Sanity to Enterprise Information Systems, 2019, Technics Publications.

Thijs Buijs

Online Marketing Strateeg: Behaalt exponenti?le groei uit online kanalen voor jouw organisatie.

1 年

Interesting read Jan Voskuil, thank you for sharing your insights.

回复
Roy Roebuck

Holistic Management Analysis and Knowledge Representation (Ontology, Taxonomy, Knowledge Graph, Thesaurus/Translator) for Enterprise Architecture, Business Architecture, Zero Trust, Supply Chain, and ML/AI foundation.

1 年

Many people can do instance modeling. Fewer can abstract from instance modeling to type/class modeling (i.e., architecture).

回复
John O'Gorman

Disambiguation Specialist

1 年

Jan - Thanks for putting this summary together. "The question remains, though: why exactly is modeling so difficult? To answer this question, we need to dig deeper in how we use language." Exactly. Looking forward to your next post in this line.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了