Rethinking Hypergraphs

Kurt Cagle

Editor In Chief @ The Cagle Report | Ontologist | Author | Iconoclast

发布日期: 2024年2月21日

When you look at your company, you likely see things - people, clients or customers, products, processes, roles, revenue, etc. Those things are connected in various ways, and those connections identify your business. In mathematical terms, these things collectively form a graph. A very simplistic graph of a business might end up looking like this.

A Simple Business Model -- graphs of graphs

Such a visual representation can be applied to anything but should be seen as a representation. You can think of the above as a map, just as an organization chart is a map. There are several domain-specific languages (or DSLs) that can encode this map as something that a computer can understand, just as a blueprint is a graphic representation (a map) of a house, a piece of machinery, or an organization that can also be encoded in DSLs (think CAD-CAM systems, for instance).

The problem with blueprints is that they are generally good at describing the state of things at a single point in time. They are, in essence, snapshots. However, in the real world, things change. In an organization, people join, leave, or get promoted to new positions. People are born, go to school, change schools, graduate, start jobs, have children, switch jobs, retire, and eventually die.

Factoring in time also means factoring in changes in relationships. Some of these changes are abrupt - a person joins or leaves a company, for instance - while others are associated with sampling, such as a publicly traded company's close-of-day price on their stock, for instance. In many cases, such changes can also represent an assertion's truthiness or similar attributes ("How certain are you that this relationship is true?" as one example).

Mathematicians have a precise meaning for the word hypergraph, primarily regarding how complex structures are represented. However, a secondary meaning is emerging: a hypergraph is a graph designed to change over time. Put another way, in a hypergraph, objects of various types produce histories of events, and those histories, in turn, reflect changes to the model itself as well as initiate other actions.

Events, ultimately, bind information together. Individually, they do not convey much information, though they do some. In the aggregate, however, events become history. Events are the foundation of time series, for instance. Events not only provide connections to those entities that participate in that event, but they can also indicate which attribute or set of attributes change, which can, in turn, affect the dependency graphs where other objects and their associated properties change in response to changes in the original attributes.

As an example, if you only wanted to show the information about Jane at the current time, this graph collapses to:

When taken in isolation, what can look fairly simple becomes considerably more complex when viewed as a sequence of events. For instance, our protagonist, Jane Doe, has worked at BigCo and InvestCo for twelve years.

The first graph represents the level of most knowledge graphs -- a snapshot of a particular set of facts at a specific time. Given that there are any number of kinds of events, temporal hypergraphs, in general, can get very complex. This complexity increases even more when dealing with projected time - events that have not yet happened, such as a scheduled conference. In hypergraphs, event instances often make up the overwhelming majority of all data in the system.

However, as the requirement for digital twins increases, these event-driven graphs will become far more common. Events are signals which tell the system something has occurred (or may occur). In the above example, the Role Event indicated that Jane Doe joined a new company (InvestCo) as a Partner.

The dashed relationships are pretty interesting. The property's current company and role, applied to Jane, are calculated - any previous links for those two properties are removed, and new ones are then added to the focus (the applies to the event, in this case, Jane). This approach means that you can ask the knowledge graph of the current job and company that Jane works for without having to do complex calculations on the fly.

Events can also act as envelopes containing metadata for processing. This is a well-established pattern in which metadata provides enough context to tell a hypergraph what to do with the data.

It’s also worth pointing out that events not only tell when something occurred, but they can also tell where something occurred. This is critical in tracking changes in locality (someone changes their current address, for example) or in movement (a tracked drone changes direction or otherwise accelerates or decelerates). The latter case is important, especially with digital twins of dynamic systems.

A Hypergraph is a Graph of Graphs (of Graphs of Graphs of ...)

A knowledge graph is a database, a hypergraph is a platform. A hypergraph can initiate actions, provide a consistent API ontology, and evolve. It can be thought of (and represented as) a graph containing other graphs, each of which exists for a particular reason and some of which contains operational semantics rather than domain semantics.

In a hypergraph, a graph is a somewhat more generalized concept than it is in a knowledge graph. It can be a container that holds sentence assertions (often known as tuples), the most basic level of information in a graph. In other cases, a graph can be a port or destination, a place in the outside world where information comes from or goes to. These can be web services, files, or LLMs. Others may be ports for reporting and intermediate file generation. These ports can also be configured from within the hypergraph to explain how such graphs deal with inbound or outbound content.

While specific semantics may vary between implementations, most hypergraphs have the structure given in the diagram above, with the following specific functionality:

Input Graphs

Data comes in a wide variety of forms and formats. The input graphs are generally transient, with data to be transformed and validated. Input graphs generally aren’t part of the canonical model.

The Canonical Graph

This is an umbrella graph holding the core or canonical model. In general content contained within the canonical graph is immutable – once written, content cannot be changed, only updated with newer versions of data. ?The canonical graph is, in turn, broken down into the following subgraphs:

Taxonomy Graph. The taxonomy represents the classification structure of the model. It consists of abstract classes (ones that define interfaces but can’t have instances) and concrete classes (ones that can have interfaces that typically inherit from abstract classes).
Operational Graph. These graphs contain configuration information, schemas and constraints, transformations, function definitions, and rules. Most of the content in the operational graph is independent of the topical domain of the graph.
Entities Graph. This is where instance data, such as specific people, places, things, intellectual property, etc., is kept. These utilize the classes identified in the taxonomy.
Events Graph. While technically part of the Entities Graph, the Event Graph differs primarily from entities in the overall volume, as events tend to be rich in connections (in a broad sense, they are the things that bind entities together).

Output Graphs.

As with input graphs, output graphs can hold containers for reports, output from processes or similar data, but can also be “ports” to external processes such as web services, files, prompts, and similar data.

In some cases, input and output graphs may be the same endpoint of the port – a request is made via an input point processed through the graph, and the corresponding response is returned asynchronously to the port as an output port. This is especially important in the case of Language Learning Models (LLMs), where the hypergraph can hold a continuous conversation with the LLM.

From Assertions to DOMs (and Back)

One of the more intriguing possibilities of Hypergraphs is in its ability to generate (and consume) document object models. An object model is an in-memory representation of an entity such as a business, department, product, or person that Python, Java or Javascript code can manipulate in either a disconnected or connected mode.

In a disconnected mode, the hypergraph can return a JSON object that can be resuscitated as a DOM model with its application interfaces. Users can work with this model, which is made up of various DOM submodels. This can be handy for data analytics or simulations, and while it can accept changes, most of the time, disconnected content is intended primarily for transmitting the state of the graph to other processes. (In essence, this creates a copy of the state of the graph)

In a connected mode, on the other hand, any change (mutation) to the DOM model will cause a corresponding change to the underlying hypergraph representation. This typically is used when needing to make continuous changes to the graph and is especially useful for managing digital twins. In this case, you’re creating a reference to the state of the graph itself, which also means that such changes typically involve orchestration in multi-user scenarios.

One advantage of such a DOM approach is that it can bind functions and methods to the underlying properties (typically via something like a Shape constraint language). Functions can effectively encapsulate and thus hide access to core data structures, create getters and setters, and validate, and hence control, attempts to pass invalid data. This creates an abstraction layer between the hypergraph and the users of the DOM, one of the more problematic areas of languages such as RDF.

From ?Hypergraphs to LLMs (and Back)

There are several pathways by which hypergraphs can interact with Language Learning Models (LLMs, also known as Large Language Models). These fit largely into writing and reading functions.

领英推荐

From Code to Law: Intellectual Property in Software

CCG/ZGDV Institute 1 个月前

Navigating Intellectual Property in the Digital Age:…

Node.Law 1 年前

DIGITAL WATERMARKING

COPYTRACK GmbH 6 个月前

Writing to the LLM

Generating Training Data. This involves creating data structures with intrinsic linking and universal resource identifiers (URIs) embedded as both IDs (identifying the resources), and links (referencing those resources). This captures the structural linking of data. These are then fed into the LLM as part of its training corpus.
Generating Schemas (Ontologies). This captures the relationship of classes (taxonomies) referenced in the training data and consequently reinforces these relationships. Again, the exact format is less important than the URIs themselves, as these get embedded into the meaning space of the LLM, providing a conceptual framework.
Providing RAG Content. In this case, the LLM queried the hypergraph to provide additional information outside of the initial model in a retrieval augmented generation process. This is a hybrid case where the input and output channels/graphs are the same (i.e., the responses are synchronous to the request).
ID Lookup. By embedding the URIs of the resources in the LLM, detailed (and likely less hallucinatory) information can be retrieved from the hypergraph during RAG generation, reducing the size of the LLM in the process.

Reading from the LLM

Metadata Augmentation. This is used during ingestion processes where description fields, labels, and even some relationships are pulled from a query to the LLM.
Schematic Transformations. This uses LLMs to generate transformations (such as XSLT or Sparql Update) that can better map data structures into output content.
Analytics. In general, hypergraph environments will have better analytics tools (especially for graph and network analysis) and data consistency within the hypergraph ensures FAIR data standards.
Query processing. The Hypergraph can pass natural language queries to the LLM to translate into more appropriate internal queries within the hypergraph itself. This becomes especially significant when working with defined ontologies and data standards, making it much easier for an LLM to make that query consistently.

Generally, a hypergraph and its corresponding LLM should be considered a fairly tightly coupled system.

Hypergraphs and Vector Search / Embeddings

One of the more significant recent innovations in the graph world is coupling knowledge graphs (and hence hypergraphs) with vector stores. A vector store can be considered a thematic index – it converts a document into a sequence of tokens that can then be applied to a high-dimensional information space.

For instance, the following shows a three-dimensional example of this process, mapping various fictional captains to different attributes. Each vector can be translated into a set of coordinates (such as (0.8, 0.2, 0.9) for Captain Picard of the USS Enterprise, NCC-1701D series, from the Star Trek universe). If you have tens of thousands of such attributes, the same mapping would apply, but with that many dimensions rather than just three.

Three Features in Our Captains Courageous Vector Database?

An LLM can then be thought of as a zipped collection of such vectors, coupled with a way of creating new vectors and determining how closely they match existing vectors. The longer the dimensionality of the vector, the more likely that what comes back is related to the test vector, and the more expensive the operation is computationally.

Vector search then works by encoding a prompt as a vector and seeing what the neighbourhood of that prompt looks like (the circle in the above diagram). For instance, if the prompt looks something like "Identify fictional captains that helm their own ships, while at the same time are depicted primarily as real characters." then the prompt would be encoded as the black vector in the above diagram. Its temperature sets the radius of the search hypersphere (here projected onto a circle) and would graph Captains Sparrow, Picard, Janeway, Kirk and Ahab but would exclude Captain America and Captain Kangaroo. It also picks up the topic of naval captains, not Army or Air Force Captains.

This is one reason a hypergraph is a natural complement to such LLMs. When you have 30,000 features (axes), the chances are high that many (if not most) of those features will be irrelevant to what you’re looking for. A hypergraph pares down the number of features (for instance, the feature “total amount of rainfall received per square kilometer compared to maximum rainfall received per square kilometer” is likely not going to be relevant to “person” in all but the most specialized of contexts (e.g., how wet is a given person likely to be in San Antonio, TX, USA vs Seattle, WA, USA?). This can make calculations far faster as the number of relevant dimensions drops considerably, likely by a couple of orders of magnitude.

If the hypergraph is feeding the LLM, this also assumes that some form of vector encoding is part of the functionality of the hypergraph, both in doing ad hoc similarity analysis within the hypergraph and in training the LLM directly. The similarity analysis is fairly critical, as most queries against hypergraphs involve finding similarity relationships. Most knowledge graph platforms pre-2022 do not have such features (or have very limited versions of such features), which meant that such queries had to be crafted by hand. That’s changing as knowledge graph platforms evolve into hypergraphs. In general, similarity searches can often significantly reduce the space of what needs to be queried, even within a hypergraph itself, then additional methods can be used to refine the searches accordingly.

Hypergraphs: State of the Art

Are there hypergraphs on the market today? Sort of. If you look at hypergraphs in the purely mathematical sense, most RDF-based graphs as well as several labeled property graphs (such as neo4J) have some hypergraph-like properties, especially in the ability to abstract out sets of graphs, provide service endpoints, and the ability to represent both enterprise and programmatic structures internally. A few have even managed to retrofit some limited generative AI capabilities in the last year, though for the most part, these are fairly simplistic.

At the same time, the trend is moving in that direction, and Hypergraphs as a product group will likely begin to differentiate themselves from knowledge graphs in the 2024 to 2025 timeframe.

The Wardley diagram for hypergraphs below illustrates the dependencies involved and indicates the maturity levels that need to exist for specific supporting technologies to make them feasible. (Note: the author is not an expert on Wardley Maps, so don’t necessarily to much about this from a business standpoint.

To summarize, there are several key technologies that a Hypergraph should have that differentiate it from a knowledge graph:

Time. Hypergraphs are temporal constructs, with events providing an outsized influence on the overall environment. This also makes a hypergraph more of a platform than a knowledge graph.
Abstraction of structures. A knowledge graph is manipulated primarily at the assertion level, typically by inserting assertions through file uploads. Hypergraphs are at a higher order of abstraction – they may be set on top of a knowledge graph structure, but entities can be manipulated at multiple levels simultaneously (they are Holonic).
Graphs of Graphs. Named graphs play a much bigger role in hypergraphs, and graphs can be conceptually treated as ports rather than simply collections of assemblies.
Interchangeability. It becomes possible to represent ordered lists readily, sets as distinct entities, methods, and filters. This means that there needs to be a stronger operational ontology at work with hypergraphs than there does with knowledge graphs.
Integration with LLMs. Hypergraphs can be considered part of a larger assembly of LLMs and Vector Stores. The hypergraph primarily emphasises concepts that are most foundational to a learning model. This also means better natural language query management.
Metaschematics and Shapes. Shapes are just beginning to take hold in the knowledge graph arena, and they play a much larger role in hypergraphs, as they often encapsulate internal data structures into dynamic DOM-like structures that provide better protection to the internal graph while simplifying data access.
Functions, Transformations and Validation. These become first-class citizens, tied into graph management and event handling. In knowledge graphs, these are much weaker and more complex to implement, often requiring very specialized knowledge about the underlying product. In hypergraphs, these are virtualized.
Better List Handling and APIs. Knowledge graphs are built around assertions and usually pass list management off to external languages. Hypergraphs usually maintain a much more comprehensive higher-order API, evaluated either through an internal language (Orchestra on the Wardley Map) or external languages such as Python or Javascript working with Orchestra-like iterators and similar API elements.
Distributed. ?Graphs can be configured so that they are distributed across multiple servers. The configuration becomes part of the operational language (also part of Orchestra). This ability to orchestrate again exists in a very limited fashion with knowledge graphs today, but it was mostly an add-on feature that was poorly integrated. With hypergraphs, it is much more seamless.
Immutability. The core of the hypergraph – the Canonical Graph – is immutable. Once an object is written to the core, it is permanent, with most state management moving to the event stream instead. This makes hypergraphs much more attractive in areas such as financial management, as the immutability requirements mean that should transactions need to be rolled back, what changes is primarily the event log.

There are other factors involved that make hypergraphs a natural evolution from knowledge graphs, but they mostly have to do with technical issues that likely require both significant adaptation and the maturing of several core pieces of secondary technology.

The Business of Hypergraphs

Most of this paper has focused largely on architectural issues, albeit at a fairly high level. It’s worth, however, exploring why hypergraphs should be on every CTO’s radar screen.

Generative AI Needs Hypergraphs. Generative AI is a generator, not a database. If it has the information that it needs, it can be reasonably accurate, but if it doesn’t, it will “make things up”, potentially making companies liable for misinformation that comes from companies. Hypergraphs go a long way towards keeping language learning models truthful, by providing logical and inferential structures that can make reasoning possible and reducing spurious (and expensive) calculations that can lead to hallucinations.
Knowledge Graphs Are Not Enough. Knowledge graphs work well for static modeling, but they are increasingly inadequate for modeling dynamic systems, and most truly useful systems are dynamic in nature. Hypergraphs address the inadequacies of knowledge graphs while retaining their core strengths as the holders of methodical, curated, and properly governed data.
Standards Are Evolving. A hypergraph can be thought of as the manifestation of a standard, and as organizations and their needs change, the standards that represent them need to evolve as well. Unfortunately, in the current model with knowledge graphs, changing standards is slow, cumbersome, and frequently political in nature, meaning that all too often, knowledge graphs become obsolete even before they are fully implemented. That has to change, and hypergraphs are emerging to respond to that need.
Hypergraphs Make Master Data Management Redundant. We have to recognize that data is distributed. This can only happen when a scheme exists for distributing keys. If two organizations talk about the same resource but have different keys, too much time and money is spent harmonizing the data between the two unless both organizations have ways of distributing and managing multiple keys (resource name synonyms). This can be done but is quite cumbersome to do so with knowledge graph capabilities, but it will be intrinsic to hypergraphs.
Hypergraphs Abstract Other Data Sources. Hypergraphs deal with abstractions. Everything Is a graph, but not all graphs are the same. The hypergraph acts as a clearing house, a way to do master data management, and an integration tool between various data systems. It is, among other things, the evolution of the data catalog and content management systems.
Hypergraphs Are For Integration. ?System integration is easily one of the biggest IT expenditures for most organizations, especially when dealing with either reorganization within an enterprise or integrating two different companies’ IT systems. This isn't easy to do with a static knowledge graph. It is considerably easier to do with a dynamic one, as resources can be tracked and traced without shutting down one or both potentially live working systems.

Overall, hypergraphs will save money from reduced integration costs, better utilization of Generative AI systems, and integrating multiple systems’ data catalogs and master data management systems. They will make querying the canonical graph much easier (especially when tied into code generators), and provide better tools for integration with external applications.

Conclusion

There are a few products that are advertising themselves as hypergraphs, largely because it makes a great marketing term, but overall this is a field that will likely see significant evolution in the coming months and years, for the simple reason that it's needed. Hypergraphs fill a necessary balancing factor to LLMs, provide a way of keeping graphs (and LLMs) more or less current while simultaneously ensuring a degree of provenance and governance that LLMs are just not well set up to manage, and they do so in a way that provides this service while at the same time opening up accessibility to the graphs (and subsequent coding) that knowledge graphs right now are just not well positioned to do.

In media res,

Kurt Cagle

Editor, The Cagle Report

Type type type hypertypetypetypetypetypetype .... type.

My Newsletters:

The Cagle Report (Linked In) known as TCR,
The Ontologist,
Generation AI,
The Cagle Report (Substack), and now
The Logician.

The Cagle Report

10,390 位关注者

Elzbieta Wiltenburg

Technical Communicator ? Project Manager ? Analytic Researcher ? Sailing Enthusiast

11 个月

This is a solution to the problem we discussed yesterday, Csaba Schmidtmayer.

1 次回应

Daniel Lundin

Head of Operations at Ortelius, Transforming Data Complexity into Strategic Insights

12 个月

Stefan Dageson, I like to hear your perspective here. I've seen similar materials from you.

Mark Spivey

Helping us all "Figure It Out" (Explore, Describe, Explain), many Differentiations + Integrations at any time .

1 年

unfortunately, people are also unnecessarily being very esoteric about hypergraphs . a hypergraph is merely what your referring to as the "mathematical definition" ... thats it ... what i mean here is that we shouldn't confuse nor conflate the interrogatives regarding the affordances of (hyper)graphs ... people already did the exact same for semantic web and RDF and graphs and knowledge graphs etc versus any and all possible competition ... and we only got here just to do it again . it is one (type of) generalization of graph theory . there are also things like "metagraphs" for instance, which actually moreso resemble what your referring to in the diagrams (the demonstration of complex reification as a first class construct) . "hypergraph" theory itself does not mean nor imply nodes being edges / edges being nodes ... this tends to be more of a reification generalization of graph theory not specific to hypergraphs . #polygranularity

1 次回应

Christian BERTRAND, MRICS

DIRECTEUR PROPERTY MANAGEMENT

1 年

Very usefull, thank you! ??

John O'Gorman

Disambiguation Specialist

1 年

This is brilliant, Kurt Cagle, and will likely become the 'go to' for anyone thinking about where graphs go next. I've had several conversations with my partners, Ruben Sardaryan and Marc Nolte, CDMP, CDP as well as Daniel Taylor, Philippe H?ij, Yaakov Belch, Joe Reis ?? among others and more recently Chaun Burnette. Really, really important and ultimately useful material - thanks very much for pulling this together!

5 次回应

查看更多评论

要查看或添加评论，请登录

Kurt Cagle的更多文章

Reality Check

2025年2月22日

Reality Check

Copyright 2025 Kurt Cagle / The Cagle Report What are we seeing here? Let me see if I can break it down: ?? Cloud…

14 条评论
MarkLogic Gets a Serious Upgrade

2025年2月15日

MarkLogic Gets a Serious Upgrade

Copyright 2025 Kurt Cagle / The Cagle Report Progress Software has just dropped the first v12 Early Access release of…

14 条评论
Beyond Copyright

2025年2月9日

Beyond Copyright

Copyright 2025 Kurt Cagle / The Cagle Report The question of copyright is now very much on people's minds. I do not…

5 条评论
Beware Those Seeking Efficiency

2025年2月8日

Beware Those Seeking Efficiency

Copyright 2025 Kurt Cagle / The Cagle Report As I write this, the Tech Bros are currently doing a hostile takeover of…

85 条评论
A Decentralized AI/KG Web

2025年2月1日

A Decentralized AI/KG Web

Copyright 2025 Kurt Cagle / The Cagle Report An Interesting Week This has been an interesting week. On Sunday, a…

48 条评论
Thoughts on DeepSeek, OpenAI, and the Red Pill/Blue Pill Dilemma of Stargate

2025年1月26日

Thoughts on DeepSeek, OpenAI, and the Red Pill/Blue Pill Dilemma of Stargate

I am currently working on Deepseek (https://chat.deepseek.

41 条评论
The (Fake) Testerone Crisis

2025年1月15日

The (Fake) Testerone Crisis

Copyright 2025 Kurt Cagle/The Cagle Report "Testosterone! What the world needs now is TESTOSTERONE!!!" - Mark…

22 条评论
Why AI Agents Aren't Agents

2025年1月15日

Why AI Agents Aren't Agents

Copyright 2025 Kurt Cagle/The Cagle Report One of the big stories in 2024 was that "2025 Would Be The Year of Agentic…

22 条评论
What to Study in 2025 If You Want A Job in 2030

2025年1月12日

What to Study in 2025 If You Want A Job in 2030

Copyright 2025 Kurt Cagle/The Cagle Report This post started out as a response to someone asking me what I thought…

28 条评论
Ontologies and Knowledge Graphs

2025年1月9日

Ontologies and Knowledge Graphs

Copyright 2025 Kurt Cagle/The Cagle Report In my last post, I talked about ontologies as language toolkits, but I'm…

53 条评论

See all articles

Rethinking Hypergraphs

Kurt Cagle

Editor In Chief @ The Cagle Report | Ontologist | Author | Iconoclast