LLM Ontology-prompting for Knowledge Graph Extraction

LLM Ontology-prompting for Knowledge Graph Extraction

Prompting an LLM with an ontology to drive Knowledge Graph extraction from unstructured documents

I make no apology for saying that a graph is the best organization of structured data. However, the vast majority of data is unstructured text.?Therefore, data needs to be transformed from its original format using an Extract-Transform-Load (ETL) or Extract-Load-Transform (ELT) into a Knowledge Graph format. There is no problem when the original format is structured, such as SQL tables, spreadsheets, etc, or at least semi-structured, such as tweets. However, when the source data is unstructured text the task of ETL/ELT to a graph is far more challenging.

This article shows?how an LLM can be prompted with an unstructured document and asked to extract a graph corresponding to a specific ontology/schema.?This is demonstrated with a Kennedy ontology in conjunction with a publicly available description of the?Kennedy family tree .

What is the Knowledge Graph I Want?

There have been several articles recently in which an LLM is used to generate a graph from a supplied document. However, I am unaware of any example in which the schema of the resultant graph conforms to a specified and supplied ontology.

The example I have chosen is that of the Kennedys family tree. There are many unstructured documents on the web describing various aspects of the?Kennedy family tree ?which can be used as sample data. Also,?TopQuadrant? has a very useful kennedys.ttl RDF graph, with an embedded ontology, which they use extensively for training. For this example, I separated the ontology into a separate graph file.

Ontology Prompting

Using the latest ChatGPT, I started by prompting the conversation with the Kennedys ontology:

Here is the complete definition of the kennedys family as an RDF ontology in TTL format, https://topbraid.org/examples/kennedys/schema:
# baseURI: https://topbraid.org/examples/kennedys/schema
@prefix kennedys: <https://topbraid.org/examples/kennedys#> .
@prefix owl: <https://www.w3.org/2002/07/owl#> .
@prefix rdf: <https://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <https://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <https://www.w3.org/2001/XMLSchema#> .
kennedys:College
a owl:Class ;
rdfs:label "College" ;
rdfs:subClassOf owl:Thing ;
.
kennedys:Gender
a owl:Class ;
rdfs:label "Gender" ;
rdfs:subClassOf owl:Thing ;
.
kennedys:Lt-governor
a kennedys:Profession ;
.
kennedys:Matriarch
a owl:Class ;
rdfs:label "Matriarch" ;
rdfs:subClassOf kennedys:Person ;
.
… and so on        

ChatGPT usefully responds with an accurate summary of the RDF ontology:

The provided TTL file represents an RDF ontology that defines the schema 
for the Kennedys family. The ontology includes classes, properties, and 
their relationships. Here's a breakdown of the ontology:
Classes:
  kennedys:College: Represents a college.
  kennedys:Gender: Represents a gender.
  kennedys:Matriarch: Represents a matriarch figure in the Kennedys family.
  kennedys:Patriarch: Represents a patriarch figure in the Kennedys family.
  kennedys:Person: Represents a person in the Kennedys family.
  kennedys:Profession: Represents a profession.

Properties:
  kennedys:almaMater: Relates a person to their alma mater (college).
  kennedys:child: Relates a person to their child.
  kennedys:deathYear: Specifies the year of death for a person.
  kennedys:firstName: Specifies the first name of a person.
  kennedys:gender: Relates a person to their gender.
  kennedys:lastName: Specifies the last name of a person.

… and so on until
  kennedys:volunteer
  kennedys:writer

Please note that the provided ontology defines the schema for representing 
the Kennedys family and their professions, but it doesn't include any 
specific instances or data about individuals within the family.        

Next, I want to provide a document and request that ChatGPT converts what it can to an RDF Graph conforming to the provided ontology.?The example document I used is?here .

Pedantic Prompting

I experimented with various prompts for ChatGPT. The first prompt simply specified the ontology, and a request to create the RDF Graph

Please create the RDF data using the https://topbraid.org/examples/kennedys/schema 
ontology of the following text:

… <document text>…        

ChatGPT’s response was a syntactically accurate graph, but ChatGPT had invented its own properties, which I did not want. I, therefore, tweaked the prompt as follows;

Using this Kennedys https://topbraid.org/examples/kennedys/schema ontology, 
please create the RDF graph data from the following text:

… <document text>…        

Again the response was a syntactically accurate graph but with the invention of some of its own properties. Frustration was setting in:-) But then I looked again at how ChatGPT had responded to the ontology prompt. It used the following expression:

… but it doesn’t include any specific instances or data about individuals within the family.

Given that I am having a two-way conversation, I should use the same form of expression as used in its response. So I modified my prompt as follows;

Using this provided ontology exclusively, please create specific instances 
and data about individuals within the family from the following text. 
Also, create the RDF graph.

… <document text>…        

A Successful Conversation!

Now ChatGPT follows my instructions precisely to produce a syntactically and semantically accurate graph:

Based on the provided text, I have created specific instances and data 
about individuals within the Kennedy family using the provided ontology. 
Here is the RDF graph representing the created instances:

@prefix kennedys: <https://example.com/ontologies/kennedys#> .
@prefix rdf: <https://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <https://www.w3.org/2000/01/rdf-schema#> .
# Individuals
kennedys:PJ_Kennedy rdf:type kennedys:Patriarch ;
  kennedys:firstName "Patrick Joseph" ;
  kennedys:lastName "Kennedy" ;
  kennedys:profession kennedys:politician, kennedys:businessman .

kennedys:Rose_Kennedy rdf:type kennedys:Matriarch ;
  kennedys:firstName "Rose Elizabeth Fitzgerald" ;
  kennedys:lastName "Kennedy" ;
  kennedys:profession kennedys:philanthropist, kennedys:socialite .

kennedys:Joe_Kennedy rdf:type kennedys:Person ;
  kennedys:firstName "Joseph Patrick" ;
  kennedys:lastName "Kennedy Jr." ;
  kennedys:profession kennedys:naval-aviator .

kennedys:John_Kennedy rdf:type kennedys:Person ;
  kennedys:firstName "John Fitzgerald" ;
  kennedys:lastName "Kennedy" ;
  kennedys:profession kennedys:politician ;
  kennedys:child kennedys:Joe_Kennedy ;
  kennedys:spouse kennedys:Jacqueline_Bouvier .

… and so on …        

Lesson learned:?pedantic prompting is preferred!

What Have We Learnt?

Conversations are with a pedantic LLM.

  • You get what you ask for!?The quality of the response is greatly influenced by the quality of the question.
  • Questions should use the same terminology as the LLM uses in prior responses.

LLMs can be instructed to adhere to a specific ontology

  • Providing an appropriate ontology prompts the LLM to transform the unstructured data into the required structure.

LLMs successfully ETL/ELT unstructured data.

  • LLMs successfully extract entities and identify properties and relationships.

What Have We To Learn?

Syntactically or semantically correct?

  • Even though the response graph is syntactically correct, we should semantically verify the content.

Trust with verification?

  • RDF graphs are far easier to semantically verify, using OWL and/or SHACL, than the original unstructured document.

Where to Next?

Knowledge Graphs, IMHO, are the perfect model of data. LLMs trained with the relevant ontology can transform unstructured text into a well-formed Knowledge Graph.

The resultant Knowledge Graph can be used to populate a graph store. Alternatively, the Knowledge Graph, now that its content can be verified, could be used to train an LL as described?here .

If you are interested in how Knowledge Graphs and Generative AI are complementary then contact me,?Peter Lawrence .

Backup Material

  • The full ChatGPT dialog is available?here .
  • The Kennedy Knowledge Graph is?here .
  • Online descriptions of the Kennedy family tree are?here , and?here .

Related Posts

Prabhu G

Developer | Learning Scienc enthusiast | full stack, DS , MLops. | IOT, CAD & FEA modelling, fiction r my Hobby | mental health advocate.

2 天前

thank you, This was most informative, I am trying to get LLMs to generate cypher queries based on the unstructured data present in PDF /text. my hypothesis was, LLM's are good at conjuring up seemingly meaningful sentences in a grammatically sound format. there needs to be a way to direct them into considering the content & context of the information & generate responses in a constrained "pedantic" fashion, especially if we want them to generate responses with an untestable & easily rectifiable sources of error. this test gives me confidence in my approach. & a stands as an easily accessible & citable work to explain my approach. p.s. great work by Neo4j on holding my hand through this process, https://graphacademy.neo4j.com/courses/llm-knowledge-graph-construction/

回复
Phil Blackwood

Enterprise Ontologist

5 个月

Very cool. I think a useful addition would be to have a prompt with the triple patterns the LLM should look for, e.g. gist:Event gist:hasParticipant gist:Person. Triple patterns like this describe how the ontology is to be used, and provide the LLM with more specific guidance than just the ontology. They also provide a simple way to validate the result.

回复
Roy Roebuck

Holistic Management Analysis and Knowledge Representation (Ontology, Taxonomy, Knowledge Graph, Thesaurus/Translator) for Enterprise Architecture, Business Architecture, Zero Trust, Supply Chain, and ML/AI foundation.

7 个月

I'd like to try this with my holistic upper general ontology. It would enable broad knowledge integration. Who wants to work with me on this?

回复
Gourav Sengupta

Head - Data Engineering, Quality, Operations, and Knowledge

1 年

Love this

回复
Sebastien Samson

Game Design Director - (Ex EA, LEGO) 15+ years in the games industry, leading teams of all sizes across various platforms, genres and business models.

1 年

Thanks for sharing Peter Lawrence . I have been looking to build myself a knowledge database system to generate documentation on demand and I was delighted to find out that I’m not the only one exploring the possibilities of LLM for semantic data mining. Which model did you use in this experiment? I have seen that you used 3.5 turbo in another article. In my experiments using 3.5 to abstract Entity-relation model data is still limited and falls short compared to gpt4. I am now inquiring into breaking down the gpt3.5 extraction process in multiple micro steps with a “chain of tough “ logic which has worked for other cases.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了