A Knowledge Graph Isn't a Picture - It's a Movie
Copyright 2024 Kurt Cagle / The Cagle Report
The problem with describing things is that those things have the annoying habit of changing.
This is the crux of any modelling system, and it is frequently a problem when people model ontologies—they assume that when they make a statement, that statement holds... until it doesn't. Then things get ... complicated.
Are you the same person you were this morning as when you went to bed last night? This is a seemingly simple question with a very complex answer. You are ... more or less. You may have gained or lost a little weight over the night, maybe just a little bit taller (your muscles sag when you get tired late at night but are more energetic in the morning), you're a little bit older, and your hair may have grown a bit. Most of these are pretty trivial changes that amount to noise.
On the other hand, sometimes change happens very quickly. You may die in your sleep. This is still a process, but it has huge legal and data implications (though, of course, at that point, those implications are no longer relevant to you). This may very well be considered a significant change.
When you get right down to it, data is an attempt to capture when, where, to whom, how and why things change. When you create a data record, you are recording those properties that have changed after a change. Your goal may be to attempt to capture when a seminal change took place (when someone died), but this can only be inferred - between this record and that record, something happened. That is the best you can achieve with data capture - it is (almost) always a historical record.
A brief divertimento - the almost in this case comes when you anticipate an event: I will have a class at 3 pm. You are expecting an event to occur in the future, but the event has yet to happen. It is not a record of an event; it is the record of the expectation of an event. You could be late getting to the class. The class could be cancelled; you might get run over by a bus (this post is definitely skating on the edge of morbidity). You model such things differently for your peace of mind because things, well, happen. We'll come back to this.
When you are modelling then, you have to decide at the outset - is the thing that I am modelling something where the probability of change is so low that I do not need to worry about it, or is it something that needs to incorporate temporality (also known as versioning). This has implications for how you model and also is determined by modeling frequency. Between today and tomorrow, there is a possibility that an asteroid could strike the Earth.
However, if you're modelling plate tectonics or extremely long-term climate forecasting (over millions of years), having a record of this particular attribute may prove helpful. On the other hand, if you are trying to track your hours from a particular job, modelling death by meteor is not germane - it is a variable that can be safely ignored. If it does happen, getting paid for your hours is probably not a high priority.
This means that in the simplest case, you can ignore changes in time and directly assign a fixed value to a given attribute. If a given attribute is known and unchanging, then it also likely doesn't factor into your records anyway.
When things do change, however, you have to ask whether the changes are sufficient to throw a given thing's various identities into question. Let's say I talk about my favorite heroine,?Jane Doe. Jane is the Barbie (TM) of the semantics world—she's been an Olympic gymnast, a ballet dancer, a scientist, and a?president; you name it, she's done it. Jane has had a career.
A typical approach to modelling here is to declare an identifier here: Person:JaneDoe. This declaration says that there is something (or here, some person) who can be readily identified as having a sense of continuity - she's born, she goes to school, she starts her rock band, she travels to the eighth dimension, and so on. She may very well change her name (which is simply another identifier, by the way) several times over that period, but there is something that remains the same when she wakes up as when she went to bed.
From the perspective of an ontologist, an ontology can be seen as a story—in this case, the story of Jane Doe's marriages.
#Turtle
@prefix Entity: <https://example.com/ns/Entity#> .
@prefix Category: <https://example.com/ns/Category#> .
@prefix Person: <https://example.com/ns/Person#> .
@prefix Habitation: <https://example.com/ns/Habitation#> .
@prefix Address: <https://example.com/ns/Address#> .
@prefix Marriage: <https://example.com/ns/Marriage#> .
@prefix City: <https://example.com/ns/City#> .
@prefix PostalCode: <https://example.com/ns/PostalCode#> .
Person:JaneDoe a Person: ;
rdfs:label "Jane Doe"^^xsd:string ;
Person:name
PersonName:JD1,
PersonName:JD2,
PersonName:JD3 ;
Person:habitation
Habitation:JD-1313MockingBirdLaneArkham ,
Habitation:JD-123SesameStreetNewYork ,
Habitation:JD-6543NESecondAveSeattle,
Habitation:JD-98765NE8thStreetBellevue ;
Person:marriage
Marriage:JaneDoeJohnSmith;
Entity:startDate "1990-01-01"^^xsd:date ;
.
Person:JohnSmith a Person: ;
rdfs:label "JohnSmith"^^xsd:string ;
Person:name
PersonName:JS1 ;
Person:habitation
Habitation:JD-6543NESecondAveSeattle ;
Person:married Marriage:JaneDoeJohnSmith ;
Entity:startDate "1987-02-03"^^xsd:date ;
.
PersonName:JD1 a PersonName: ;
rdfs:label "Jane Elizabeth Doe" ;
PersonName:legalName "Jane Elizabeth Doe";
PersonName:preferredName "Janey" ;
Entity:startDate "1990-01-01"^^xsd:date ;
Entity:endDate "2012-06-15"^^xsd:date ;
.
PersonName:JD2 a PersonName: ;
rdfs:label "Jane Elizabeth Smith" ;
PersonName:legalName "Jane Elizabeth Smith;
PersonName:preferredName "Jane" ;
Entity:startDate "2012-06-16"^^xsd:date ;
Entity:endDate "2017-03-19"^^xsd:date ;
.
PersonName:JD3 a PersonName: ;
rdfs:label "Jane Elizabeth Doe" ;
PersonName:legalName "Jane Elizabeth Doe";
PersonName:preferredName "Jayne" ;
Entity:startDate "2017-03-20"^^xsd:date ;
.
PersonName:JS1 a PersonName: ;
rdfs:label "John Matthew Smith;
Entity:startDate "1987-02-03"^^xsd:date ;
.
Habitation:JD-1313MockingBirdLaneArkham a Habitation: ;
rdfs:label "Jane Doe @ 1313 Mockingbird Lane, Arkham";
Habitation:address Address:1313MockingBirdLaneArkham ;
Entity:startDate "1990-01-01"^^xsd:date ;
Entity:endDate "2008-08-31"^^xsd:date ;
.
Address:1313MockingBirdLane
rdfs:label "1313 Mockingbird Lane";
Address:street "1313 Mockingbird Lane" ;
Address:city City:ArkhamM ;
Address:postalCode PostalCode:USA-00123 ;
Entity:startDate "1952-03-11"^^xsd:date;
.
Habitation:JD-123SesameStNY a Habitation: ;
rdfs:label "Jane Doe @ 123 Sesame Street, New York,NY";
Habitation:address Address:123SesameStNewYork ;
Entity:startDate "2008-09-01"^^xsd:date ;
Entity:endDate "2012-06-15"^^xsd:date ;
.
Address:123SesameStNewYork
rdfs:label "123 Sesame Street, New York,NY" ;
Address:street "123 Sesame Street, New York, NY ;
Address:city City:NewYorkNY
Address:postalCode PostalCode:USA-11265 ;
Entity:startDate "1968-09-21"^^xsd:date;
.
Habitation:JD-6543NESecondAveSeattle a Habitation: ;
rdfs:label "Jane Doe @ 6543 NE Second Ave, Seattle, WA;
Habitation:address Address:6543NESecondAveSeattle ;
Entity:startDate "2012-06-16"^^xsd:date ;
Entity:endDate "2016-03-19"^^xsd:date ;
.
Address:6543NESecondAveSeattle
rdfs:label "6543 NE Second Ave, Seattle, WA" ;
Address:street "6543 NE Second Ave" ;
Address:city City:SeattleWA
Address:postalCode PostalCode:USA-98125 ;
Entity:startDate "1978-01-17"^^xsd:date;
.
Habitation:JD-98765NE8thStreetBellevue a Habitation: ;
rdfs:label "Jane Doe @ 98765 NE 8th Street Bellevue, WA";
Habitation:address Address:98765NE8thStreetBellevue ;
Entity:startDate "2016-03-20"^^xsd:date ;
.
Address:98765NE8thStreetBellevue
rdfs:label "98765 NE 8th Street Bellevue, WA" ;
Address:street "98765 NE 8th Street" ;
Address:city City:BellevueWA ;
Address:postalCode PostalCode:USA-98039 ;
Entity:startDate "1986-04-11"^^xsd:date;
.
Habitation:JS-6543NESecondAveSeattle a Habitation: ;
rdfs:label "John Smith @ 6543 NE Second Ave, Seattle, WA;
Habitation:address Address:6543NESecondAveSeattle ;
Entity:startDate "1987-02-03"^^xsd:date ;
.
Marriage:JaneDoeJohnSmith a Marriage: ;
rdfs:label "Marriage of Jane Doe & John Smith" ;
Entity:startDate "2012-06-16"^^xsd:date ;
Entity:endDate "2017-03-19"^^xsd:date ;
.
In this situation, Jane Doe grows up in Arkham, Massachusetts, goes to school in New York, marries John Smith, who lives in Seattle, then divorces him a few years later, moving to a different house in nearby Bellevue (to keep the story simple, John lives in the same house his entire life, which might be an explanation for why she eventually divorced him).
There are several facets to notice here. First of all - almost everything begins, continues, then ends. The almost everythings in this case I describe as entities. A person is an entity - she is born, lives a life, and eventually dies, which is a separate event but the final part of the same entity.
While this is not often fully appreciated, names are entities. A person is born, but the name may be determined somewhat later. I was Boy Child Cagle for three weeks before my parents came up with a name for the birth certificate. In many cultures, when a woman marries, she typically takes on a new name reflecting the family name of her spouse, as in PersonName:JD2 ("Jane Elizabeth Smith"), though sometimes both spouses take on a hyphenated surname.
It's noteworthy that when Jane divorces John, she takes the name she had previously, but the model creates a new name with different start (and potentially end) dates. Why is this different? This is because a name is a record, and if you model this record to have only one beginning and one ending date, then you can see the evolution of the name state over time (and it makes the model simpler as a consequence). In the example above, you can see how the preferred name shifts even when the legal name remains the same.
This points out a fairly critical point - when a given entity changes, you terminate the old entity and create a new entity. The question then becomes how high up this change should apply. As a rule of thumb - for a given entity, when the state of a property within the immediate scope of that entity changes, then a new entity should be created, and a reference added to the containing entity, and the old entity should be ended. If multiple properties change at the same time, then it makes sense to bundle these changes
For instance, if Jane marries again in 2024 and changes her preferred name to Beth (short for Elizabeth, her middle name) then both changes could be accomplished at the same time:
PersonName:JD3 Entity:endDate "2024-04-10"^^xsd:date .
Person:JaneDoe Person:name PersonName:JD4 .
PersonName:JD4 a PersonName: ;
rdfs:label "Jane Elizabeth McDougal" ;
PersonName:legalName "Jane Elizabeth McDougal";
PersonName:preferredName "Beth" ;
Entity:startDate "2024-04-11"^^xsd:date ;
.
Note that this is one of RDF's advantages—you only need to indicate what has changed without erasing what already exists.
The new marriage can similarly be added (assuming Ian McDougal already exists).
Marriage:JaneDoeIanMcDougal a Marriage: ;
rdfs:label "Marriage of Jane Doe & Ian McDougal" ;
Entity:startDate "2024-04-11"^^xsd:date ;
.
Person:JaneDoe Marriage:JaneDoe_IanMcDougal.
Person:IanMcDougal Person:marriage Marriage:JaneDoe_IanMcDougal.
The use of Person:marriage here indicates another modelling concept. Marriages (and, for that matter, most contractual agreements) are entities indicating some form of agreement between two or more participants. Once the marriage ends, the entity goes out of scope in the present but remains in the past.
You could model this a bit differently, putting the participants into the marriage:
#Alternative approach
Marriage:JaneDoeIanMcDougal a Marriage: ;
rdfs:label "Marriage of Jane Doe & Ian McDougal" ;
Marriage:spouse Person:JaneDoe, Person:IanMcDougal ;
Entity:startDate "2024-04-11"^^xsd:date ;
.
However, the first form has an advantage: It makes marriage a typical relationship between two (or perhaps more) people. If the primary orientation of your graph is top-down, with something like Person or Company having the associated entries, then this approach is likely superior to taking a more bottom approach.
Habitations and Addresses
Habitations and Address work in a similar fashion. A habitation is an entity, and indicates where someone lived over a given time. An address is an entity as well, but it usually shows when a given building was constructed or when it was torn down at a specific location. They appear to be similar concepts, but a habitation is tied to a given person, while an address refers to the actual lot of building and, as such, is a property of a habitation.
领英推荐
Habitation:JD-6543NESecondAveSeattle a Habitation: ;
rdfs:label "Jane Doe @ 6543 NE Second Ave, Seattle, WA;
Habitation:address Address:6543NESecondAveSeattle ;
Entity:startDate "2012-06-16"^^xsd:date ;
Entity:endDate "2016-03-19"^^xsd:date ;
.
Habitation:JS-6543NESecondAveSeattle a Habitation: ;
rdfs:label "John Smith @ 6543 NE Second Ave, Seattle, WA;
Habitation:address Address:6543NESecondAveSeattle ;
Entity:startDate "1987-02-03"^^xsd:date ;
.
Address:6543NESecondAveSeattle
rdfs:label "6543 NE Second Ave, Seattle, WA" ;
Address:street "6543 NE Second Ave" ;
Address:city City:SeattleWA
Address:postalCode PostalCode:USA-98125 ;
Entity:startDate "1978-01-17"^^xsd:date;
.
The other advantage of the use of habitation is that multiple people may live at a given place at different periods. For instance, John Smith lived at the Seattle house from the time he was born, but Jane only lived there from 2012 to 2016 (leaving even before their divorce was finalized).
Here's a visualization of this dataset to make it a little more apparent what's going on:
Entity Events and Timelines
What is most significant about this approach is that you can see the evolution of a model over time without having to create a separate Now graph. For instance, the following SPARQL query can give you the state of every entity (assuming all entities do descend from Entity: ) at any given moment:
describe ?entity where {
values ?date {xsd:date(now())}
?entity a ?class .
?class rdfs:subClassOf Entity: .
?entity Entity:startDate ?startDate .
optional {
?entity Entity:endDate ?endDateTmp
}
bind (bound(?endDateTmp),?endDateTmp,xsd:date(now())) as ?endDate)
filter { ?date >= ?startDate && ?date < ?endDate}
}
This will return all entities that exist at a given point of time. If there is no end date, then the current date is used for that calculation. A similar type of calculation can be done where the values given are date-times (including hours, minutes, seconds and normalized time zone info).
One question may come up here - what shouldn't be an entity (have a start/end time)? The answer is surprisingly simple: very little shouldn't have a temporal component to it. Taxonomy terms may immediately jump out as one resource category that doesn't have a temporal aspect, but even that's not quite true. Taxonomy terms get redefined all the time. Sometimes taxonomy terms are assigned versions, but you move away from having to think of these as having specific versions (and deprecations) and instead constrain each revision to exist for a given time. You can have a single term as a resource that will have the correct structure and definition for that particular time, something like:
Concept:Revenue a Concept: ;
Concept:version
ConceptVersion:RevenueV1 ,
ConceptVersion:RevenueV2 ,
ConceptVersion:RevenueV3 ,
ConceptVersion:RevenueV4 ;
.
ConceptVersion:RevenueV1 a ConceptVersion: ;
ConceptVersion:label "Revenue";
ConceptVersion:definition "This is the V1 definition" ;
ConceptVersion:author Person:JaneDoe;
Entity:startDate "2018-03-04^^xsd:date ;
Entity:endDate "2020-05-11"^^xsd:date ;
.
ConceptVersion:RevenueV2 a ConceptVersion: ;
ConceptVersion:label "Revenue";
ConceptVersion:definition "This is the V2 definition" ;
ConceptVersion:author Person:JaneDoe;
Entity:startDate "2020-05-12" ^^xsd:date;
Entity:endDate "2022-11-03"^^xsd:date ;
.
ConceptVersion:RevenueV3 a ConceptVersion: ;
ConceptVersion:label "Revenue";
ConceptVersion:definition "This is the V3 definition" ;
ConceptVersion:author Person:JohnSmith ;
Entity:startDate "2022-11-03" ^^xsd:date;
Entity:endDate "2024-05-21"^^xsd:date ;
.
ConceptVersion:RevenueV4 a ConceptVersion: ;
ConceptVersion:label "Revenue";
ConceptVersion:definition "This is the V4 definition" ;
ConceptVersion:author Person:JaneDoe ;
Entity:startDate "2024-05-22"^^xsd:date ;
.
This means that if you want to get the definition that was relevant for Concept:Revenue, today, you could use a variation of the above query:
select ?definition where {
values (?term ?date) {(Concept:Revenue xsd:date(now()))}
?term Concept:version ?version .
?version Entity:startDate ?startDate .
optional {
?entity Entity:endDate ?endDateTmp
}
?version ConceptVersion:definition ?definition.
bind (bound(?endDateTmp),?endDateTmp,xsd:date(now())) as ?endDate)
filter { ?date >= ?startDate && ?date < ?endDate}
}
This would retrieve the fourth definition, as the current date is past the last open-ended start date.
You can assign version numbers to each version for reference, but they are there primarily for human consumption. In most cases, the real value is getting the version appropriate for the date context in question.
Note that a similar approach can be taken with recurrent dates or activities that take place at a certain time of the week. I'll leave that as an exercise for a future article.
There are both advantages and disadvantages to this approach. On the plus side:
On the minus side:
What this means in practice is fairly simple—by being mindful of the history (the temporal aspects) of what you are modelling, you can create both a history or evolution map and an up-to-date now map. Still, you have to accept the additional complexity this introduces into your queries.
However, a knowledge graph is a model, not just of relationships but also of how those relationships change over time. A good knowledge graph isn't a picture—it's a movie.
In media res,
Kurt Cagle
Editor, The Cagle Report
If you want to shoot the breeze or have a cup of virtual coffee, I have a Calendly account at https://calendly.com/theCagleReport, and I am available for consulting and full-time work.
Emekli (EYT) - Tüpra?
7 个月In my opinion there is multiple probabilities in multiple consciousness of human being, like efforts chosen for your prewritten destiny lifelines ??
Stranded infonaut - a legal alien in the IT bazaar
7 个月The initial statement misses an important aspect: beyond the things themselves, the content and structure of that description (metadata) are also things and have the annoying habit of changing. To address these problems, the first step is to realize that (transferrable) knowledge IS a graph. You try to serialize this graph to human or machine-focused streams (that stream can be a document, a list of SQL commands, API calls, etc.) Tools focus on these streams (mostly in textual forms in various syntaxes) instead of the original graph. The conclusion that these texts are snapshots is not a surprise because serialization is the process of creating a snapshot. The current LLM - RAG attempt to automate knowledge extraction from text is like trying to get back a living animal from the minced meat. True knowledge sharing works on the graph level, before any serialization. I gave it a try to explain this part here. https://youtu.be/_vE2ZNSf1l0
Information Architect / Technology Project Manager / Process Governance / Data Governance
7 个月So, Steve Miller just called and…
Model Manager | Enterprise Architecture & ArchiMate Advocate | Expert in MBSE, PLM, STEP Standards & Ontologies | Open Source Innovator(ArchiCG)
7 个月This is what has been captured when dealing with 4D ontologies. What is also to be considered: life cycle of data, life cycle of what they represent, and point on time when you are interacting with data: can you position you query at a given point on time, past or future when you are planning? Can you compare plan with actual achievement, compare and learn from it? And having trace of creation context with intended usage of the representations you are producing? Finally can you animate graphs according such or such time referential, in order to produce dynamic representations, e.g. for simulation of evolution? Here is something which could looks like a movie. Except if you can parameterized the animation, or interact with it.
Semi-Retired CxO Open to assisting highly innovative organizations creating value
7 个月Machines as yet cannot evolve. They are designed for a purpose and are so constrained by it. Dynamical aspects of data behavior (perceptions, etc.) may invoke different logic but all systems we've ever built have a finite set of 'equivalence classes' of things they can distinguish and respond to. Living things on the other hand have the ability to go beyond their 'genetic' programming and phenotypical limits, generating and responding to novelty. Until a machine can be built able to 'play' with the environment in which it resides, creating new perceptions through experimentation, and thus new 'models' of the environment and how it works, it cannot evolve.