RDF, Graphs and Mermaid Diagrams
What few people realize is that mermaids excel at both C and Shell programming.

RDF, Graphs and Mermaid Diagrams

There is an old saying that Democracy is the worst form of government known to mankind, except for all others. The same can be said for the Resource Description Framework or RDF.

Most people need help understanding RDF. This shouldn't be that surprising. RDF is a way of encoding semantic graphs. Chances are pretty good that if you are a programmer, you are only marginally aware of what a semantic graph is, and it more than likely seems like a lot of work for something you can store in JSON.

This is part of the reason why I'm increasingly inclined to express RDF as pictures when I discuss it. For example, suppose I had a small company with one manager and three employees. This creates a graph:

Try this out live.

There's a lot of information packed into this particular graph. Each rounded box is a thing (what we in RDF land call a resource) and each edge is a given relationship (either reports to or works for), which is known as a predicate. A statement then indicates that one thing (the subject) has a relationship (the predicate) with another thing (the object).

This graph can then be decomposed into a set of statements, specifically:

Note that there is a fundamental notion implied here. The graph is not connecting strings together, it's connecting things or concepts together. Somewhere in the back end of all of this, there's a unique identifier for each resource (and each edge), and it is actually these identifiers that are pointing to one another, but from the perspective of the picture, those identifiers are just implementation details. The semantics are in the relationships.

This is actually what RDF is all about. At it's core, it is simply a way of describing semantic graphs. One can argue that the diagram above is RDF, if you assume that each node (box) and each distinctly labeled edge (arrow) have unique identifiers. This relationship can be seen in the Mermaid Markdown file that was used to create the above graph:

flowchart LR
%% Namespaces
%%    Company: <https://www.example.com/ns/Company#>
%%    Person:  <https://www.example.com/ns/Person#>

%% Declarations
    Company:SmallCo(<b>Company</b>\nSmall Co)
    Person:Ann(<b>Person</b>\nAnn)
    Person:Bill(<b>Person</b>\nBill) 
    Person:Clara(<b>Person</b>\nClara)
    Person:Dave(<b>Person</b>\nDave)
%% Works For Relationships
    Person:Ann -->|works for| Company:SmallCo
    Person:Bill -->|works for| Company:SmallCo
    Person:Clara -->|works for| Company:SmallCo
    Person:Dave -->|works for| Company:SmallCo
%% Reports To Relationships
    Person:Bill -->|reports to| Person:Ann
    Person:Clara -->|reports to| Person:Ann
    Person:Dave -->|reports to| Person:Ann        

Each expression, such as Company:SmallCo, is an identifier, and can actually be expanded to a likely much more unique uniform resource identifier (URI, or IRI, if following international standards) of

<https://www.example.com/ns/Company#SmallCo>        

In and of themselves, these URIs have no real significance, beyond their uniqueness within a specific context. Their value comes, however, in the fact that if you know the URI, then you can identify the node or edge in the graph, and from there can extract information.

This holds true for XML, and while it doesn't hold true for JSON directly, there are specific JSON standards (such as JSON-LD) that support the idea of namespace contexts. You can see the value of this above - the namespace prefixes (here, Company: and Person: ) make it easy to see what particular "class" of entity a given resource is.

In the example given above, this is actually added into descriptive label for each resource. For instance, the person Bill is "declared" as follows:

    Person:Bill(<b>Person</b>\nBill)        

where the type is contained in the first line as a bold label, and the actual label of the resource is given on the second line, after the \n (newline) character.

This statement above, by the way, can be parsed as four statements in Turtle, another RDF language:

# Turtle

Person:Bill a Person:.
Person:Bill rdfs:label "Bill" .
Person: a rdfs:Class .
Person: rdfs:label "Person" .
        

This says that Bill is of class Person: , they have a label of "Bill" as a string, the Person: class is itself a class in the RDF system, and that class has the label "Person".

Note that in Mermaid notation, the %% symbol is the start of a comment line. Mermaid by itself does not understand the notion of namespace, but by including the namespace declarations as comments, it makes it easier to retain RDF semantic information:

%% Namespaces
%%    Company: <https://www.example.com/ns/Company#>
%%    Person:  <https://www.example.com/ns/Person#>        

Let's say that we add four more statements into the graph, in this case adding the literal data (dates) showing when each person joined:

Try this out live.

This also includes a bit of styling information to more clearly differentiate literals from people from companies.

%% Mermaid Markdown
%% Namespaces
%%    Company: <https://www.example.com/ns/Company#>
%%    Person:  <https://www.example.com/ns/Person#>

flowchart LR
%% Declarations
    SmallCo(<b>Company</b>\nSmall Co):::company
    Ann(<b>Person</b>\nAnn):::person
    Bill(<b>Person</b>\nBill):::person
    Clara(<b>Person</b>\nClara):::person
    Dave(<b>Person</b>\nDave):::person
%% Works For Relationships
    Ann -->|works for| SmallCo
    Bill -->|works for| SmallCo
    Clara -->|works for| SmallCo
    Dave -->|works for| SmallCo
%% Reports To Relationships
    Bill -->|reports to| Ann
    Clara -->|reports to| Ann
    Dave -->|reports to| Ann
%% Date Started Literal
    Ann -->|date started| 2016-01-25[[<b>Date</b>\n2016-01-25]]:::date
    Bill -->|date started| 2018-03-12[[<b>Date</b>\n2016-01-25]]:::date
    Clara -->|date started| 2018-10-17[[<b>Date</b>\n2018-10-17]]:::date
    Dave -->|date started| 2021-01-16[[<b>Date</b>\n2021-01-16]]:::date

classDef company fill:lightGreen,stroke:black;
classDef person fill:lightBlue,stroke:black;
classDef date fill:#FFE,stroke:black;        

This is still RDF. RDF is an abstraction, a way of defining statements. At it's core it's a very simple language, or more properly, meta-language, because so long as you preserve links and namespaces, through some conventional means, you can describe a graph with it. This is important.

Triple stores are built around RDF. Other graph databases, such as Neo4J, use OpenCypher or TigerGraph and an internal representation, but that internal representation has LONG been able to support mapping to/from RDF. RDF is simply an abstraction used to define graphs: it is meta-language independent. This even holds true for the use graphs within graphs (named graphs). The following is a mermaid diagram (along with source code) showing the same data but this with each entity being in their own graphs:

Try this out live.

The source code makes use of the subgraph structures, as well as a bit of CSS styling:

%% Mermaid Markdown
%% Namespaces
%%    Company: <https://www.example.com/ns/Company#>
%%    Person:  <https://www.example.com/ns/Person#>

flowchart LR
    subgraph Graph:SmallCo
        Company:SmallCo(<b>Company</b>\nSmall Co):::company
    end
    subgraph Graph:Ann
        Person:Ann(<b>Person</b>\nAnn):::person
        Person:Ann -->|works for| Company:SmallCo
        Person:Ann -->|date started| 2016-01-25[[<b>Date</b>\n2016-01-25]]:::date
    end
    subgraph Graph:Bill
        Person:Bill(<b>Person</b>\nBill):::person
        Person:Bill -->|works for| Company:SmallCo
        Person:Bill -->|reports to| Person:Ann
        Person:Bill -->|date started| 2018-03-12[[<b>Date</b>\n2016-01-25]]:::date
    end
    subgraph Graph:Clara
        Person:Clara(<b>Person</b>\nClara):::person
        Person:Clara -->|works for| Company:SmallCo
        Person:Clara -->|reports to| Person:Ann
        Person:Clara -->|date started| 2018-10-17[[<b>Date</b>\n2018-10-17]]:::date
    end
    subgraph Graph:Dave
        Person:Dave(<b>Person</b>\nDave):::person
        Person:Dave -->|works for| Company:SmallCo
        Person:Dave -->|reports to| Person:Ann
        Person:Dave -->|date started| 2021-01-16[[<b>Date</b>\n2021-01-16]]:::date
    end
    subgraph Graph:Person:Elsa
        Person:Elsa(<b>Person</b>\nElsa):::Person
        Person:Elsa -->|works for| Company:SmallCo
        Person:Elsa -->|reports to| Person:Dave
        Person:Elsa -->|date started| 2023-04-12[[<b>Date</b>\n2023-04-12]]:::literal
    end

classDef company fill:lightGreen,stroke:black;
classDef person fill:lightBlue,stroke:black;
classDef date fill:#FFE,stroke:black;
style Graph:SmallCo stroke:#000,stroke-width:1px;
style Graph:Ann stroke:#000,stroke-width:1px;
style Graph:Bill stroke:#000,stroke-width:1px;
style Graph:Clara stroke:#000,stroke-width:1px;
style Graph:Dave stroke:#000,stroke-width:1px;        

The Significance of RDF

Why does this matter? There are actually several benefits that derive from working with RDF as an abstraction rather than simply as an internal storage format:

  • RDF is a powerful tool for designing and data modelling. You can conceptualize data structures through exemplars then use those to create schemas.
  • RDF is essential for data interoperability across different formats. By providing a consistent mechanism for modelling, you retain the underlying structure between translations.
  • RDF also makes visualization possible. All too often,visualization is hampered because there are no consistent hooks in data, so you have to customize your visualizations with each ontology. By using RDF, you make it easier to query and find those hooks.
  • Language Learning Models (LLMs and SLMs) reason better when they have RDF guidance. Tools such as rdf2vec and others can encode RDF into graph neural networks, which have much better reasoning and lower hallucinatory problems than unaugmented models.
  • RDF can similarly be used with prompt engineering as patterns for LLM prompts and responses.

For instance, in the next to last example in the previous section, I can paste this in as a prompt (this works for ChatGPTv4):



%% Mermaid Markdown
%% Namespaces
%%    Company: <https://www.example.com/ns/Company#>
%%    Person:  <https://www.example.com/ns/Person#>

flowchart LR
        Company:SmallCo(<b>Company</b>\nSmall Co):::Company
        Person:Ann(<b>Person</b>\nAnn):::Person
        Person:Ann -->|works for| Company:SmallCo
        Person:Ann -->|date started| 2016-01-25[[<b>Date</b>\n2016-01-25]]:::literal
        Person:Bill(<b>Person</b>\nBill):::Person
        Person:Bill -->|works for| Company:SmallCo
        Person:Bill -->|reports to| Person:Ann
        Person:Bill -->|date started| 2018-03-12[[<b>Date</b>\n2016-01-25]]:::literal
        Person:Clara(<b>Person</b>\nClara):::Person
        Person:Clara -->|works for| Company:SmallCo
        Person:Clara -->|reports to| Person:Ann
        Person:Clara -->|date started| 2018-10-17[[<b>Date</b>\n2018-10-17]]:::literal
        Person:Dave(<b>Person</b>\nDave):::Person
        Person:Dave -->|works for| Company:SmallCo
        Person:Dave -->|reports to| Person:Ann
        Person:Dave -->|date started| 2021-01-16[[<b>Date</b>\n2021-01-16]]:::literal
        Person:Elsa(<b>Person</b>\nElsa):::Person
        Person:Elsa -->|works for| Company:SmallCo
        Person:Elsa -->|reports to| Person:Dave
        Person:Elsa -->|date started| 2023-04-12[[<b>Date</b>\n2023-04-12]]:::literal
classDef Company fill:lightGreen,stroke:black;
classDef Person fill:lightBlue,stroke:black;
classDef literal fill:#FFE,stroke:black;
style Graph:Company:SmallCo stroke:#000,stroke-width:1px;
style Graph:Person:Ann stroke:#000,stroke-width:1px;
style Graph:Person:Bill stroke:#000,stroke-width:1px;
style Graph:Person:Clara stroke:#000,stroke-width:1px;
style Graph:Person:Dave stroke:#000,stroke-width:1px;
style Graph:Person:Elsa stroke:#000,stroke-width:1px;
        

The response from the Chatbot:

{
    "@context": {
        "rdf": "https://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "rdfs": "https://www.w3.org/2000/01/rdf-schema#",
        "xsd": "https://www.w3.org/2001/XMLSchema#",
        "Person": "https://www.example.com/ns/Person#",
        "Company": "https://www.example.com/ns/Company#",
        "worksFor": {"@id": "Person:worksFor", "@type": "@id"},
        "reportsTo": {"@id": "Person:reportsTo", "@type": "@id"},
        "dateStarted": {"@id": "Person:dateStarted", "@type": "xsd:date"}
    },
    "@graph": [
        {
            "@id": "Company:SmallCo",
            "@type": "Company",
            "rdfs:label": "Small Co"
        },
        {
            "@id": "Person:Ann",
            "@type": "Person",
            "rdfs:label": "Ann",
            "worksFor": "Company:SmallCo",
            "dateStarted": "2016-01-25"
        },
        {
            "@id": "Person:Bill",
            "@type": "Person",
            "rdfs:label": "Bill",
            "worksFor": "Company:SmallCo",
            "reportsTo": "Person:Ann",
            "dateStarted": "2018-03-12"
        },
        {
            "@id": "Person:Clara",
            "@type": "Person",
            "rdfs:label": "Clara",
            "worksFor": "Company:SmallCo",
            "reportsTo": "Person:Ann",
            "dateStarted": "2018-10-17"
        },
        {
            "@id": "Person:Dave",
            "@type": "Person",
            "rdfs:label": "Dave",
            "worksFor": "Company:SmallCo",
            "reportsTo": "Person:Ann",
            "dateStarted": "2021-01-16"
        },
        {
            "@id": "Person:Elsa",
            "@type": "Person",
            "rdfs:label": "Elsa",
            "worksFor": "Company:SmallCo",
            "reportsTo": "Person:Dave",
            "dateStarted": "2023-04-12"
        }
    ]
}        

This JSON structure includes a @context to define the terms used within the graph, making it easier to understand the relationships between entities. The @graph array then lists each entity with its type, properties, and relationships to other entities. This format keeps the fundamental RDF nature, with each entity represented as a resource identified by a URI (@id) and having a set of properties and relations with other resources.

The JSON, by the way, is also RDF, specifically using the JSON-LD (linked data) format. It is similarly loss-less: by utilizing the RDF nature of the data, you are much more likely to retain semantic meaning regardless of the format. You can even consistently write a parser for converting such mermaid markdown format to JSON directly. I will leave that as fodder for another article.


Note that in the Python script given above, there is no real domain-specific code here - because the information is kept in RDF, the context of that information is now separate from its presentation, between two wildly different presentations. The reverse translation is similarly context-free, including generating the colors for styling the elements:

# Python

import json
import hashlib

# Sample JSON-LD input (for demonstration; replace with actual file input as needed)
json_ld = """
{
  "@context": {
    "rdf": "https://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "Company": "https://www.example.com/ns/Company#",
    "Person": "https://www.example.com/ns/Person#",
    "worksFor": {"@id": "Company:worksFor", "@type": "@id"},
    "reportsTo": {"@id": "Person:reportsTo", "@type": "@id"},
    "dateStarted": "https://www.example.com/ns/dateStarted"
  },
  "@graph": [
    {
      "@id": "Company:SmallCo",
      "@type": "Company",
      "name": "Small Co"
    },
    ...
  ]
}
"""

def generate_color(name):
    """Generate a hex color code based on the hash of the name."""
    hash_code = hashlib.md5(name.encode()).hexdigest()
    # Use the first 6 digits of the hash for the color
    return f'#{hash_code[:6]}'

def jsonld_to_mermaid(jsonld_str):
    jsonld = json.loads(jsonld_str)
    context = jsonld["@context"]
    graph = jsonld["@graph"]

    # Start constructing the Mermaid markdown
    mermaid_md = "%% Mermaid Markdown\n%% Namespaces\n"
    for prefix, uri in context.items():
        if isinstance(uri, str):
            mermaid_md += f"%%    {prefix}: <{uri}>\n"
    
    mermaid_md += "flowchart LR\n"
    
    class_defs = {}
    
    # Adding entities to Mermaid markdown
    for node in graph:
        node_id = node["@id"]
        node_type = node["@type"]
        node_name = node.get("name", "")
        mermaid_md += f"    {node_id}(<b>{node_type}</b>\\n{node_name}):::{node_type.lower()}\n"
        # Generate class definitions dynamically
        if node_type not in class_defs:
            class_defs[node_type] = generate_color(node_type)
    
    # Adding relationships to Mermaid markdown
    for node in graph:
        node_id = node["@id"]
        for predicate, value in node.items():
            if predicate in ["@id", "@type", "name"]:
                continue
            predicate_label = next((key for key, val in context.items() if val["@id"] == predicate), predicate)
            mermaid_md += f"    {node_id} -->|{predicate_label}| {value}\n"
    
    # Append dynamically generated class definitions
    for class_name, color in class_defs.items():
        mermaid_md += f"classDef {class_name.lower()} fill:{color},stroke:black;\n"

    return mermaid_md

# Convert JSON-LD to Mermaid markdown
mermaid_markdown = jsonld_to_mermaid(json_ld)
print(mermaid_markdown)
        

This generates the following output:

%% Mermaid Markdown
%% Namespaces
%%    rdf: <https://www.w3.org/1999/02/22-rdf-syntax-ns#>
%%    Company: <https://www.example.com/ns/Company#>
%%    Person: <https://www.example.com/ns/Person#>
%%    dateStarted: <https://www.example.com/ns/dateStarted>
flowchart LR
    Company:SmallCo(<b>Company</b>\nSmall Co):::company
    Person:Ann(<b>Person</b>\nAnn):::person
    Person:Bill(<b>Person</b>\nBill):::person
    Person:Clara(<b>Person</b>\nClara):::person
    Person:Dave(<b>Person</b>\nDave):::person
    Person:Ann -->|worksFor| Company:SmallCo
    Person:Ann -->|dateStarted| 2016-01-25
    Person:Bill -->|worksFor| Company:SmallCo
    Person:Bill -->|reportsTo| Person:Ann
    Person:Bill -->|dateStarted| 2018-03-12
    Person:Clara -->|worksFor| Company:SmallCo
    Person:Clara -->|reportsTo| Person:Ann
    Person:Clara -->|dateStarted| 2018-10-17
    Person:Dave -->|worksFor| Company:SmallCo
    Person:Dave -->|reportsTo| Person:Ann
    Person:Dave -->|dateStarted| 2021-01-16
    classDef company fill:#1c76cb,stroke:black;
    classDef person fill:#40bed7,stroke:black;        

which can then be generated as the following diagram:

Other than the colour change (which was a deliberate feature, as there may be more classes than the two you started with), the fidelity of the process is maintained.

Conclusion

What this has done is pretty remarkable: a given JSON-LD (RDF) document will generate a corresponding diagram output. In other words, because you've used RDF, you have created a Domain Specific Language (DSL) for this particular schema while needing to know nothing about the context - and given a particular diagram, you can retrieve the data that built that diagram consistently.

Thus, RDF goes hand-in-hand with DSLs, making them especially potent when dealing with AI-generated content. The value of the RDF comes not in the specific format but in the idea that you are maintaining the important data from one format to another.

In Media Res,


Mermaid fishing. Yup.

Kurt Cagle

Editor, The Cagle Report

My Newsletters:




Such a great article and thank you for covering the basics of RDF graph encoding. That said, RDF has been around for a while and the elephant in the room seems to be how low the adoption has been? Granted, life science, healthcare, and other industries use RDF and ontologies extensively, but was the Semantic Web vision just too far ahead of its time?

回复
Shep ?? Bryan

A top follow for AI, ontology, and smart new strategies.

12 个月

What a succinct encapsulation. Thanks Kurt! Found this really helpful as it puts some scaffolding up around a thing I naturally find myself doing with language models - creating ontologies of the topic or work I’m focused on to recycle in as injected context in new sessions focused on the same topic. After reading this, I guess what I’m essentially doing is creating DSL packets. I find this process oddly therapeutic, like by creating ontologies with AI I’m bringing order to my mind’s chaos haha. I personally would love some more reading on this if anyone has some good bread crumbs to follow

回复
Georg Philip Krog

Pioneering AI-Driven Data Privacy, Security & Compliance | Creator of Data Privacy and Security Standard Vocabularies and Ontologies | Founder of Signatu | Transforming Legal Tech into Business Advantage

12 个月
回复
Jim Saiya

Software Analyst and Developer with a Broad and Deep Set of Skills

12 个月

Kurt, I don't know if you've been made aware of this, but in emails of your newsletter that LinkedIn sends out, a lot of content, such as the code blocks and some things in the text, gets dropped on the floor.

回复
Margaret Warren

Applied Image MetaData+Knowledge Scientist at the Intersection of Embedded Metadata, Knowledge Graphs and Data-Centric AI

12 个月

great article - sorry you missed todays DC meetup - RDF nuances were the theme of the meething :-) Upcoming we will welcome a speaker who will be telling us/demonstrating an entire budgeting system coded in RDF using Shacl.

要查看或添加评论,请登录

Kurt Cagle的更多文章

  • Reality Check

    Reality Check

    Copyright 2025 Kurt Cagle / The Cagle Report What are we seeing here? Let me see if I can break it down: ?? Cloud…

    14 条评论
  • MarkLogic Gets a Serious Upgrade

    MarkLogic Gets a Serious Upgrade

    Copyright 2025 Kurt Cagle / The Cagle Report Progress Software has just dropped the first v12 Early Access release of…

    14 条评论
  • Beyond Copyright

    Beyond Copyright

    Copyright 2025 Kurt Cagle / The Cagle Report The question of copyright is now very much on people's minds. I do not…

    5 条评论
  • Beware Those Seeking Efficiency

    Beware Those Seeking Efficiency

    Copyright 2025 Kurt Cagle / The Cagle Report As I write this, the Tech Bros are currently doing a hostile takeover of…

    85 条评论
  • A Decentralized AI/KG Web

    A Decentralized AI/KG Web

    Copyright 2025 Kurt Cagle / The Cagle Report An Interesting Week This has been an interesting week. On Sunday, a…

    48 条评论
  • Thoughts on DeepSeek, OpenAI, and the Red Pill/Blue Pill Dilemma of Stargate

    Thoughts on DeepSeek, OpenAI, and the Red Pill/Blue Pill Dilemma of Stargate

    I am currently working on Deepseek (https://chat.deepseek.

    41 条评论
  • The (Fake) Testerone Crisis

    The (Fake) Testerone Crisis

    Copyright 2025 Kurt Cagle/The Cagle Report "Testosterone! What the world needs now is TESTOSTERONE!!!" - Mark…

    22 条评论
  • Why AI Agents Aren't Agents

    Why AI Agents Aren't Agents

    Copyright 2025 Kurt Cagle/The Cagle Report One of the big stories in 2024 was that "2025 Would Be The Year of Agentic…

    22 条评论
  • What to Study in 2025 If You Want A Job in 2030

    What to Study in 2025 If You Want A Job in 2030

    Copyright 2025 Kurt Cagle/The Cagle Report This post started out as a response to someone asking me what I thought…

    28 条评论
  • Ontologies and Knowledge Graphs

    Ontologies and Knowledge Graphs

    Copyright 2025 Kurt Cagle/The Cagle Report In my last post, I talked about ontologies as language toolkits, but I'm…

    53 条评论

社区洞察

其他会员也浏览了