LLM Ontology-prompting for Knowledge Graph Extraction
Prompting an LLM with an ontology to drive Knowledge Graph extraction from unstructured documents
I make no apology for saying that a graph is the best organization of structured data. However, the vast majority of data is unstructured text.?Therefore, data needs to be transformed from its original format using an Extract-Transform-Load (ETL) or Extract-Load-Transform (ELT) into a Knowledge Graph format. There is no problem when the original format is structured, such as SQL tables, spreadsheets, etc, or at least semi-structured, such as tweets. However, when the source data is unstructured text the task of ETL/ELT to a graph is far more challenging.
This article shows?how an LLM can be prompted with an unstructured document and asked to extract a graph corresponding to a specific ontology/schema.?This is demonstrated with a Kennedy ontology in conjunction with a publicly available description of the?Kennedy family tree .
What is the Knowledge Graph I Want?
There have been several articles recently in which an LLM is used to generate a graph from a supplied document. However, I am unaware of any example in which the schema of the resultant graph conforms to a specified and supplied ontology.
The example I have chosen is that of the Kennedys family tree. There are many unstructured documents on the web describing various aspects of the?Kennedy family tree ?which can be used as sample data. Also,?TopQuadrant? has a very useful kennedys.ttl RDF graph, with an embedded ontology, which they use extensively for training. For this example, I separated the ontology into a separate graph file.
Ontology Prompting
Using the latest ChatGPT, I started by prompting the conversation with the Kennedys ontology:
Here is the complete definition of the kennedys family as an RDF ontology in TTL format, https://topbraid.org/examples/kennedys/schema:
# baseURI: https://topbraid.org/examples/kennedys/schema
@prefix kennedys: <https://topbraid.org/examples/kennedys#> .
@prefix owl: <https://www.w3.org/2002/07/owl#> .
@prefix rdf: <https://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <https://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <https://www.w3.org/2001/XMLSchema#> .
kennedys:College
a owl:Class ;
rdfs:label "College" ;
rdfs:subClassOf owl:Thing ;
.
kennedys:Gender
a owl:Class ;
rdfs:label "Gender" ;
rdfs:subClassOf owl:Thing ;
.
kennedys:Lt-governor
a kennedys:Profession ;
.
kennedys:Matriarch
a owl:Class ;
rdfs:label "Matriarch" ;
rdfs:subClassOf kennedys:Person ;
.
… and so on
ChatGPT usefully responds with an accurate summary of the RDF ontology:
The provided TTL file represents an RDF ontology that defines the schema
for the Kennedys family. The ontology includes classes, properties, and
their relationships. Here's a breakdown of the ontology:
Classes:
kennedys:College: Represents a college.
kennedys:Gender: Represents a gender.
kennedys:Matriarch: Represents a matriarch figure in the Kennedys family.
kennedys:Patriarch: Represents a patriarch figure in the Kennedys family.
kennedys:Person: Represents a person in the Kennedys family.
kennedys:Profession: Represents a profession.
Properties:
kennedys:almaMater: Relates a person to their alma mater (college).
kennedys:child: Relates a person to their child.
kennedys:deathYear: Specifies the year of death for a person.
kennedys:firstName: Specifies the first name of a person.
kennedys:gender: Relates a person to their gender.
kennedys:lastName: Specifies the last name of a person.
… and so on until
kennedys:volunteer
kennedys:writer
Please note that the provided ontology defines the schema for representing
the Kennedys family and their professions, but it doesn't include any
specific instances or data about individuals within the family.
Next, I want to provide a document and request that ChatGPT converts what it can to an RDF Graph conforming to the provided ontology.?The example document I used is?here .
Pedantic Prompting
I experimented with various prompts for ChatGPT. The first prompt simply specified the ontology, and a request to create the RDF Graph
Please create the RDF data using the https://topbraid.org/examples/kennedys/schema
ontology of the following text:
… <document text>…
ChatGPT’s response was a syntactically accurate graph, but ChatGPT had invented its own properties, which I did not want. I, therefore, tweaked the prompt as follows;
Using this Kennedys https://topbraid.org/examples/kennedys/schema ontology,
please create the RDF graph data from the following text:
… <document text>…
Again the response was a syntactically accurate graph but with the invention of some of its own properties. Frustration was setting in:-) But then I looked again at how ChatGPT had responded to the ontology prompt. It used the following expression:
… but it doesn’t include any specific instances or data about individuals within the family.
Given that I am having a two-way conversation, I should use the same form of expression as used in its response. So I modified my prompt as follows;
Using this provided ontology exclusively, please create specific instances
and data about individuals within the family from the following text.
Also, create the RDF graph.
… <document text>…
A Successful Conversation!
Now ChatGPT follows my instructions precisely to produce a syntactically and semantically accurate graph:
领英推荐
Based on the provided text, I have created specific instances and data
about individuals within the Kennedy family using the provided ontology.
Here is the RDF graph representing the created instances:
@prefix kennedys: <https://example.com/ontologies/kennedys#> .
@prefix rdf: <https://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <https://www.w3.org/2000/01/rdf-schema#> .
# Individuals
kennedys:PJ_Kennedy rdf:type kennedys:Patriarch ;
kennedys:firstName "Patrick Joseph" ;
kennedys:lastName "Kennedy" ;
kennedys:profession kennedys:politician, kennedys:businessman .
kennedys:Rose_Kennedy rdf:type kennedys:Matriarch ;
kennedys:firstName "Rose Elizabeth Fitzgerald" ;
kennedys:lastName "Kennedy" ;
kennedys:profession kennedys:philanthropist, kennedys:socialite .
kennedys:Joe_Kennedy rdf:type kennedys:Person ;
kennedys:firstName "Joseph Patrick" ;
kennedys:lastName "Kennedy Jr." ;
kennedys:profession kennedys:naval-aviator .
kennedys:John_Kennedy rdf:type kennedys:Person ;
kennedys:firstName "John Fitzgerald" ;
kennedys:lastName "Kennedy" ;
kennedys:profession kennedys:politician ;
kennedys:child kennedys:Joe_Kennedy ;
kennedys:spouse kennedys:Jacqueline_Bouvier .
… and so on …
Lesson learned:?pedantic prompting is preferred!
What Have We Learnt?
Conversations are with a pedantic LLM.
LLMs can be instructed to adhere to a specific ontology
LLMs successfully ETL/ELT unstructured data.
What Have We To Learn?
Syntactically or semantically correct?
Trust with verification?
Where to Next?
Knowledge Graphs, IMHO, are the perfect model of data. LLMs trained with the relevant ontology can transform unstructured text into a well-formed Knowledge Graph.
The resultant Knowledge Graph can be used to populate a graph store. Alternatively, the Knowledge Graph, now that its content can be verified, could be used to train an LL as described?here .
If you are interested in how Knowledge Graphs and Generative AI are complementary then contact me,?Peter Lawrence .
Backup Material
Developer | Learning Scienc enthusiast | full stack, DS , MLops. | IOT, CAD & FEA modelling, fiction r my Hobby | mental health advocate.
2 天前thank you, This was most informative, I am trying to get LLMs to generate cypher queries based on the unstructured data present in PDF /text. my hypothesis was, LLM's are good at conjuring up seemingly meaningful sentences in a grammatically sound format. there needs to be a way to direct them into considering the content & context of the information & generate responses in a constrained "pedantic" fashion, especially if we want them to generate responses with an untestable & easily rectifiable sources of error. this test gives me confidence in my approach. & a stands as an easily accessible & citable work to explain my approach. p.s. great work by Neo4j on holding my hand through this process, https://graphacademy.neo4j.com/courses/llm-knowledge-graph-construction/
Enterprise Ontologist
5 个月Very cool. I think a useful addition would be to have a prompt with the triple patterns the LLM should look for, e.g. gist:Event gist:hasParticipant gist:Person. Triple patterns like this describe how the ontology is to be used, and provide the LLM with more specific guidance than just the ontology. They also provide a simple way to validate the result.
Holistic Management Analysis and Knowledge Representation (Ontology, Taxonomy, Knowledge Graph, Thesaurus/Translator) for Enterprise Architecture, Business Architecture, Zero Trust, Supply Chain, and ML/AI foundation.
7 个月I'd like to try this with my holistic upper general ontology. It would enable broad knowledge integration. Who wants to work with me on this?
Head - Data Engineering, Quality, Operations, and Knowledge
1 年Love this
Game Design Director - (Ex EA, LEGO) 15+ years in the games industry, leading teams of all sizes across various platforms, genres and business models.
1 年Thanks for sharing Peter Lawrence . I have been looking to build myself a knowledge database system to generate documentation on demand and I was delighted to find out that I’m not the only one exploring the possibilities of LLM for semantic data mining. Which model did you use in this experiment? I have seen that you used 3.5 turbo in another article. In my experiments using 3.5 to abstract Entity-relation model data is still limited and falls short compared to gpt4. I am now inquiring into breaking down the gpt3.5 extraction process in multiple micro steps with a “chain of tough “ logic which has worked for other cases.