RDF, Graphs and Mermaid Diagrams
There is an old saying that Democracy is the worst form of government known to mankind, except for all others. The same can be said for the Resource Description Framework or RDF.
Most people need help understanding RDF. This shouldn't be that surprising. RDF is a way of encoding semantic graphs. Chances are pretty good that if you are a programmer, you are only marginally aware of what a semantic graph is, and it more than likely seems like a lot of work for something you can store in JSON.
This is part of the reason why I'm increasingly inclined to express RDF as pictures when I discuss it. For example, suppose I had a small company with one manager and three employees. This creates a graph:
There's a lot of information packed into this particular graph. Each rounded box is a thing (what we in RDF land call a resource) and each edge is a given relationship (either reports to or works for), which is known as a predicate. A statement then indicates that one thing (the subject) has a relationship (the predicate) with another thing (the object).
This graph can then be decomposed into a set of statements, specifically:
Note that there is a fundamental notion implied here. The graph is not connecting strings together, it's connecting things or concepts together. Somewhere in the back end of all of this, there's a unique identifier for each resource (and each edge), and it is actually these identifiers that are pointing to one another, but from the perspective of the picture, those identifiers are just implementation details. The semantics are in the relationships.
This is actually what RDF is all about. At it's core, it is simply a way of describing semantic graphs. One can argue that the diagram above is RDF, if you assume that each node (box) and each distinctly labeled edge (arrow) have unique identifiers. This relationship can be seen in the Mermaid Markdown file that was used to create the above graph:
flowchart LR
%% Namespaces
%% Company: <https://www.example.com/ns/Company#>
%% Person: <https://www.example.com/ns/Person#>
%% Declarations
Company:SmallCo(<b>Company</b>\nSmall Co)
Person:Ann(<b>Person</b>\nAnn)
Person:Bill(<b>Person</b>\nBill)
Person:Clara(<b>Person</b>\nClara)
Person:Dave(<b>Person</b>\nDave)
%% Works For Relationships
Person:Ann -->|works for| Company:SmallCo
Person:Bill -->|works for| Company:SmallCo
Person:Clara -->|works for| Company:SmallCo
Person:Dave -->|works for| Company:SmallCo
%% Reports To Relationships
Person:Bill -->|reports to| Person:Ann
Person:Clara -->|reports to| Person:Ann
Person:Dave -->|reports to| Person:Ann
Each expression, such as Company:SmallCo, is an identifier, and can actually be expanded to a likely much more unique uniform resource identifier (URI, or IRI, if following international standards) of
<https://www.example.com/ns/Company#SmallCo>
In and of themselves, these URIs have no real significance, beyond their uniqueness within a specific context. Their value comes, however, in the fact that if you know the URI, then you can identify the node or edge in the graph, and from there can extract information.
This holds true for XML, and while it doesn't hold true for JSON directly, there are specific JSON standards (such as JSON-LD) that support the idea of namespace contexts. You can see the value of this above - the namespace prefixes (here, Company: and Person: ) make it easy to see what particular "class" of entity a given resource is.
In the example given above, this is actually added into descriptive label for each resource. For instance, the person Bill is "declared" as follows:
Person:Bill(<b>Person</b>\nBill)
where the type is contained in the first line as a bold label, and the actual label of the resource is given on the second line, after the \n (newline) character.
This statement above, by the way, can be parsed as four statements in Turtle, another RDF language:
# Turtle
Person:Bill a Person:.
Person:Bill rdfs:label "Bill" .
Person: a rdfs:Class .
Person: rdfs:label "Person" .
This says that Bill is of class Person: , they have a label of "Bill" as a string, the Person: class is itself a class in the RDF system, and that class has the label "Person".
Note that in Mermaid notation, the %% symbol is the start of a comment line. Mermaid by itself does not understand the notion of namespace, but by including the namespace declarations as comments, it makes it easier to retain RDF semantic information:
%% Namespaces
%% Company: <https://www.example.com/ns/Company#>
%% Person: <https://www.example.com/ns/Person#>
Let's say that we add four more statements into the graph, in this case adding the literal data (dates) showing when each person joined:
This also includes a bit of styling information to more clearly differentiate literals from people from companies.
%% Mermaid Markdown
%% Namespaces
%% Company: <https://www.example.com/ns/Company#>
%% Person: <https://www.example.com/ns/Person#>
flowchart LR
%% Declarations
SmallCo(<b>Company</b>\nSmall Co):::company
Ann(<b>Person</b>\nAnn):::person
Bill(<b>Person</b>\nBill):::person
Clara(<b>Person</b>\nClara):::person
Dave(<b>Person</b>\nDave):::person
%% Works For Relationships
Ann -->|works for| SmallCo
Bill -->|works for| SmallCo
Clara -->|works for| SmallCo
Dave -->|works for| SmallCo
%% Reports To Relationships
Bill -->|reports to| Ann
Clara -->|reports to| Ann
Dave -->|reports to| Ann
%% Date Started Literal
Ann -->|date started| 2016-01-25[[<b>Date</b>\n2016-01-25]]:::date
Bill -->|date started| 2018-03-12[[<b>Date</b>\n2016-01-25]]:::date
Clara -->|date started| 2018-10-17[[<b>Date</b>\n2018-10-17]]:::date
Dave -->|date started| 2021-01-16[[<b>Date</b>\n2021-01-16]]:::date
classDef company fill:lightGreen,stroke:black;
classDef person fill:lightBlue,stroke:black;
classDef date fill:#FFE,stroke:black;
This is still RDF. RDF is an abstraction, a way of defining statements. At it's core it's a very simple language, or more properly, meta-language, because so long as you preserve links and namespaces, through some conventional means, you can describe a graph with it. This is important.
Triple stores are built around RDF. Other graph databases, such as Neo4J, use OpenCypher or TigerGraph and an internal representation, but that internal representation has LONG been able to support mapping to/from RDF. RDF is simply an abstraction used to define graphs: it is meta-language independent. This even holds true for the use graphs within graphs (named graphs). The following is a mermaid diagram (along with source code) showing the same data but this with each entity being in their own graphs:
The source code makes use of the subgraph structures, as well as a bit of CSS styling:
领英推荐
%% Mermaid Markdown
%% Namespaces
%% Company: <https://www.example.com/ns/Company#>
%% Person: <https://www.example.com/ns/Person#>
flowchart LR
subgraph Graph:SmallCo
Company:SmallCo(<b>Company</b>\nSmall Co):::company
end
subgraph Graph:Ann
Person:Ann(<b>Person</b>\nAnn):::person
Person:Ann -->|works for| Company:SmallCo
Person:Ann -->|date started| 2016-01-25[[<b>Date</b>\n2016-01-25]]:::date
end
subgraph Graph:Bill
Person:Bill(<b>Person</b>\nBill):::person
Person:Bill -->|works for| Company:SmallCo
Person:Bill -->|reports to| Person:Ann
Person:Bill -->|date started| 2018-03-12[[<b>Date</b>\n2016-01-25]]:::date
end
subgraph Graph:Clara
Person:Clara(<b>Person</b>\nClara):::person
Person:Clara -->|works for| Company:SmallCo
Person:Clara -->|reports to| Person:Ann
Person:Clara -->|date started| 2018-10-17[[<b>Date</b>\n2018-10-17]]:::date
end
subgraph Graph:Dave
Person:Dave(<b>Person</b>\nDave):::person
Person:Dave -->|works for| Company:SmallCo
Person:Dave -->|reports to| Person:Ann
Person:Dave -->|date started| 2021-01-16[[<b>Date</b>\n2021-01-16]]:::date
end
subgraph Graph:Person:Elsa
Person:Elsa(<b>Person</b>\nElsa):::Person
Person:Elsa -->|works for| Company:SmallCo
Person:Elsa -->|reports to| Person:Dave
Person:Elsa -->|date started| 2023-04-12[[<b>Date</b>\n2023-04-12]]:::literal
end
classDef company fill:lightGreen,stroke:black;
classDef person fill:lightBlue,stroke:black;
classDef date fill:#FFE,stroke:black;
style Graph:SmallCo stroke:#000,stroke-width:1px;
style Graph:Ann stroke:#000,stroke-width:1px;
style Graph:Bill stroke:#000,stroke-width:1px;
style Graph:Clara stroke:#000,stroke-width:1px;
style Graph:Dave stroke:#000,stroke-width:1px;
The Significance of RDF
Why does this matter? There are actually several benefits that derive from working with RDF as an abstraction rather than simply as an internal storage format:
For instance, in the next to last example in the previous section, I can paste this in as a prompt (this works for ChatGPTv4):
%% Mermaid Markdown
%% Namespaces
%% Company: <https://www.example.com/ns/Company#>
%% Person: <https://www.example.com/ns/Person#>
flowchart LR
Company:SmallCo(<b>Company</b>\nSmall Co):::Company
Person:Ann(<b>Person</b>\nAnn):::Person
Person:Ann -->|works for| Company:SmallCo
Person:Ann -->|date started| 2016-01-25[[<b>Date</b>\n2016-01-25]]:::literal
Person:Bill(<b>Person</b>\nBill):::Person
Person:Bill -->|works for| Company:SmallCo
Person:Bill -->|reports to| Person:Ann
Person:Bill -->|date started| 2018-03-12[[<b>Date</b>\n2016-01-25]]:::literal
Person:Clara(<b>Person</b>\nClara):::Person
Person:Clara -->|works for| Company:SmallCo
Person:Clara -->|reports to| Person:Ann
Person:Clara -->|date started| 2018-10-17[[<b>Date</b>\n2018-10-17]]:::literal
Person:Dave(<b>Person</b>\nDave):::Person
Person:Dave -->|works for| Company:SmallCo
Person:Dave -->|reports to| Person:Ann
Person:Dave -->|date started| 2021-01-16[[<b>Date</b>\n2021-01-16]]:::literal
Person:Elsa(<b>Person</b>\nElsa):::Person
Person:Elsa -->|works for| Company:SmallCo
Person:Elsa -->|reports to| Person:Dave
Person:Elsa -->|date started| 2023-04-12[[<b>Date</b>\n2023-04-12]]:::literal
classDef Company fill:lightGreen,stroke:black;
classDef Person fill:lightBlue,stroke:black;
classDef literal fill:#FFE,stroke:black;
style Graph:Company:SmallCo stroke:#000,stroke-width:1px;
style Graph:Person:Ann stroke:#000,stroke-width:1px;
style Graph:Person:Bill stroke:#000,stroke-width:1px;
style Graph:Person:Clara stroke:#000,stroke-width:1px;
style Graph:Person:Dave stroke:#000,stroke-width:1px;
style Graph:Person:Elsa stroke:#000,stroke-width:1px;
The response from the Chatbot:
{
"@context": {
"rdf": "https://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs": "https://www.w3.org/2000/01/rdf-schema#",
"xsd": "https://www.w3.org/2001/XMLSchema#",
"Person": "https://www.example.com/ns/Person#",
"Company": "https://www.example.com/ns/Company#",
"worksFor": {"@id": "Person:worksFor", "@type": "@id"},
"reportsTo": {"@id": "Person:reportsTo", "@type": "@id"},
"dateStarted": {"@id": "Person:dateStarted", "@type": "xsd:date"}
},
"@graph": [
{
"@id": "Company:SmallCo",
"@type": "Company",
"rdfs:label": "Small Co"
},
{
"@id": "Person:Ann",
"@type": "Person",
"rdfs:label": "Ann",
"worksFor": "Company:SmallCo",
"dateStarted": "2016-01-25"
},
{
"@id": "Person:Bill",
"@type": "Person",
"rdfs:label": "Bill",
"worksFor": "Company:SmallCo",
"reportsTo": "Person:Ann",
"dateStarted": "2018-03-12"
},
{
"@id": "Person:Clara",
"@type": "Person",
"rdfs:label": "Clara",
"worksFor": "Company:SmallCo",
"reportsTo": "Person:Ann",
"dateStarted": "2018-10-17"
},
{
"@id": "Person:Dave",
"@type": "Person",
"rdfs:label": "Dave",
"worksFor": "Company:SmallCo",
"reportsTo": "Person:Ann",
"dateStarted": "2021-01-16"
},
{
"@id": "Person:Elsa",
"@type": "Person",
"rdfs:label": "Elsa",
"worksFor": "Company:SmallCo",
"reportsTo": "Person:Dave",
"dateStarted": "2023-04-12"
}
]
}
This JSON structure includes a @context to define the terms used within the graph, making it easier to understand the relationships between entities. The @graph array then lists each entity with its type, properties, and relationships to other entities. This format keeps the fundamental RDF nature, with each entity represented as a resource identified by a URI (@id) and having a set of properties and relations with other resources.
The JSON, by the way, is also RDF, specifically using the JSON-LD (linked data) format. It is similarly loss-less: by utilizing the RDF nature of the data, you are much more likely to retain semantic meaning regardless of the format. You can even consistently write a parser for converting such mermaid markdown format to JSON directly. I will leave that as fodder for another article.
Note that in the Python script given above, there is no real domain-specific code here - because the information is kept in RDF, the context of that information is now separate from its presentation, between two wildly different presentations. The reverse translation is similarly context-free, including generating the colors for styling the elements:
# Python
import json
import hashlib
# Sample JSON-LD input (for demonstration; replace with actual file input as needed)
json_ld = """
{
"@context": {
"rdf": "https://www.w3.org/1999/02/22-rdf-syntax-ns#",
"Company": "https://www.example.com/ns/Company#",
"Person": "https://www.example.com/ns/Person#",
"worksFor": {"@id": "Company:worksFor", "@type": "@id"},
"reportsTo": {"@id": "Person:reportsTo", "@type": "@id"},
"dateStarted": "https://www.example.com/ns/dateStarted"
},
"@graph": [
{
"@id": "Company:SmallCo",
"@type": "Company",
"name": "Small Co"
},
...
]
}
"""
def generate_color(name):
"""Generate a hex color code based on the hash of the name."""
hash_code = hashlib.md5(name.encode()).hexdigest()
# Use the first 6 digits of the hash for the color
return f'#{hash_code[:6]}'
def jsonld_to_mermaid(jsonld_str):
jsonld = json.loads(jsonld_str)
context = jsonld["@context"]
graph = jsonld["@graph"]
# Start constructing the Mermaid markdown
mermaid_md = "%% Mermaid Markdown\n%% Namespaces\n"
for prefix, uri in context.items():
if isinstance(uri, str):
mermaid_md += f"%% {prefix}: <{uri}>\n"
mermaid_md += "flowchart LR\n"
class_defs = {}
# Adding entities to Mermaid markdown
for node in graph:
node_id = node["@id"]
node_type = node["@type"]
node_name = node.get("name", "")
mermaid_md += f" {node_id}(<b>{node_type}</b>\\n{node_name}):::{node_type.lower()}\n"
# Generate class definitions dynamically
if node_type not in class_defs:
class_defs[node_type] = generate_color(node_type)
# Adding relationships to Mermaid markdown
for node in graph:
node_id = node["@id"]
for predicate, value in node.items():
if predicate in ["@id", "@type", "name"]:
continue
predicate_label = next((key for key, val in context.items() if val["@id"] == predicate), predicate)
mermaid_md += f" {node_id} -->|{predicate_label}| {value}\n"
# Append dynamically generated class definitions
for class_name, color in class_defs.items():
mermaid_md += f"classDef {class_name.lower()} fill:{color},stroke:black;\n"
return mermaid_md
# Convert JSON-LD to Mermaid markdown
mermaid_markdown = jsonld_to_mermaid(json_ld)
print(mermaid_markdown)
This generates the following output:
%% Mermaid Markdown
%% Namespaces
%% rdf: <https://www.w3.org/1999/02/22-rdf-syntax-ns#>
%% Company: <https://www.example.com/ns/Company#>
%% Person: <https://www.example.com/ns/Person#>
%% dateStarted: <https://www.example.com/ns/dateStarted>
flowchart LR
Company:SmallCo(<b>Company</b>\nSmall Co):::company
Person:Ann(<b>Person</b>\nAnn):::person
Person:Bill(<b>Person</b>\nBill):::person
Person:Clara(<b>Person</b>\nClara):::person
Person:Dave(<b>Person</b>\nDave):::person
Person:Ann -->|worksFor| Company:SmallCo
Person:Ann -->|dateStarted| 2016-01-25
Person:Bill -->|worksFor| Company:SmallCo
Person:Bill -->|reportsTo| Person:Ann
Person:Bill -->|dateStarted| 2018-03-12
Person:Clara -->|worksFor| Company:SmallCo
Person:Clara -->|reportsTo| Person:Ann
Person:Clara -->|dateStarted| 2018-10-17
Person:Dave -->|worksFor| Company:SmallCo
Person:Dave -->|reportsTo| Person:Ann
Person:Dave -->|dateStarted| 2021-01-16
classDef company fill:#1c76cb,stroke:black;
classDef person fill:#40bed7,stroke:black;
which can then be generated as the following diagram:
Other than the colour change (which was a deliberate feature, as there may be more classes than the two you started with), the fidelity of the process is maintained.
Conclusion
What this has done is pretty remarkable: a given JSON-LD (RDF) document will generate a corresponding diagram output. In other words, because you've used RDF, you have created a Domain Specific Language (DSL) for this particular schema while needing to know nothing about the context - and given a particular diagram, you can retrieve the data that built that diagram consistently.
Thus, RDF goes hand-in-hand with DSLs, making them especially potent when dealing with AI-generated content. The value of the RDF comes not in the specific format but in the idea that you are maintaining the important data from one format to another.
In Media Res,
Kurt Cagle
Editor, The Cagle Report
My Newsletters:
Such a great article and thank you for covering the basics of RDF graph encoding. That said, RDF has been around for a while and the elephant in the room seems to be how low the adoption has been? Granted, life science, healthcare, and other industries use RDF and ontologies extensively, but was the Semantic Web vision just too far ahead of its time?
A top follow for AI, ontology, and smart new strategies.
12 个月What a succinct encapsulation. Thanks Kurt! Found this really helpful as it puts some scaffolding up around a thing I naturally find myself doing with language models - creating ontologies of the topic or work I’m focused on to recycle in as injected context in new sessions focused on the same topic. After reading this, I guess what I’m essentially doing is creating DSL packets. I find this process oddly therapeutic, like by creating ontologies with AI I’m bringing order to my mind’s chaos haha. I personally would love some more reading on this if anyone has some good bread crumbs to follow
Pioneering AI-Driven Data Privacy, Security & Compliance | Creator of Data Privacy and Security Standard Vocabularies and Ontologies | Founder of Signatu | Transforming Legal Tech into Business Advantage
12 个月Kurt Cagle, at Signatu we are exploring rdf to make graphs for representing personal data processing: https://www.dhirubhai.net/pulse/knowledge-graphs-personal-data-used-processing-georg-philip-krog-uehxf?utm_source=share&utm_medium=member_ios&utm_campaign=share_via
Software Analyst and Developer with a Broad and Deep Set of Skills
12 个月Kurt, I don't know if you've been made aware of this, but in emails of your newsletter that LinkedIn sends out, a lot of content, such as the code blocks and some things in the text, gets dropped on the floor.
Applied Image MetaData+Knowledge Scientist at the Intersection of Embedded Metadata, Knowledge Graphs and Data-Centric AI
12 个月great article - sorry you missed todays DC meetup - RDF nuances were the theme of the meething :-) Upcoming we will welcome a speaker who will be telling us/demonstrating an entire budgeting system coded in RDF using Shacl.