登录查看更多内容

Hamiltonian Paths

Jeff Tallman

Principal Sales Engineer at Neo4j

发布日期: 2022年11月14日

Last time I posted – admittedly a while ago…..but as much necessity is the mother of invention, work is often the frustration of expressing one’s thoughts on paper….the post was on relationship isomorphism with the title “Friend of a friend”

Fast forward a few months and a colleague came to me with a problem.??How to prevent non-Hamiltonian paths in Neo4j.??Thankfully, he included the Wikipedia link so I could translate his request into common English.?

To speed things along, generally, it states:

In the?mathematical?field of?graph theory, a?Hamiltonian path?(or?traceable path) is a?path?in an undirected or directed graph that visits each?vertex?exactly once. A?Hamiltonian cycle?(or?Hamiltonian circuit) is a?cycle?that visits each vertex exactly once. A Hamiltonian path that starts and ends at adjacent vertices can be completed by adding one more edge to form a Hamiltonian cycle, and removing any edge from a Hamiltonian cycle produces a Hamiltonian path. Determining whether such paths and cycles exist in graphs (the?Hamiltonian path problem?and?Hamiltonian cycle problem) are?NP-complete.

Hamiltonian paths and cycles are named after?William Rowan Hamilton?who invented the?icosian game, now also known as?Hamilton's puzzle, which involves finding a Hamiltonian cycle in the edge graph of the?dodecahedron. Hamilton solved this problem using the?icosian calculus, an?algebraic structure?based on?roots of unity?with many similarities to the?quaternions?(also invented by Hamilton). This solution does not generalize to arbitrary graphs.

Okay – so Hamilton musta been really bored and lived a really long time ago.??I mean, what’s wrong with good old fashion chess game that you have to come up with a game about geometry?

But I digress.??The problem as my colleague postulated, concerned something that looked like the following, but a tad simpler

In simple terms, he wants to traverse from 1 to 6 or 2 to 6 without doing the loop of 2->3->4 as node #2 gets traversed twice which is a non-Hamilton Path.?

Most graph DBMS’s subscribe to graph theory in which a generic path can have one of three core traversal types.?Consider the below as a common example I sometimes use when introducing Neo4j to people as the Friend of a Friend type query performance issue is a common problem:

Assume all the edges are one direction pointing to the higher node number except the second relationship of 6->2.??We’ll get to that.

The first and most common graph traversal technique is called “Walk”.??Maybe it should be called “Wander” instead.?In a “Walk”, any node or relationship can be repeated in a path.??So, to traverse from 1->3 the following paths are legal, assuming directionality in the traversal:

1->2->3
1->2->6->2->3
1->2->6->2->6->2->3
....
1->2->3->5->6->2->3
1->2->3->4->5->6->2->3
....

If we ignore direction, there are a lot more possibilities.?However, the path 1->2->6->2->3 (and many others) is non-Hamilton Path.?Note that although node #3 is the destination, it can be traversed through on a longer path in theory.??But let’s ignore that an assume the DBMS is smart enough to terminate the path at the target the first time. Non-Hamilton paths right from the start are never in the shortest path – but they are extremely common and lead to extensive path explosion.?Unfortunately, the GQL standard seems to suggest that “Walk” has to be supported at a minimum.?For those new to Graph DBMS, GQL is the ISO standard query language for Graph DBMS – similar to SQL as the ISO standard for RDBMS. ?GQL is based extensively on OpenCypher a common language supported by 20+ Graph DBMS’s, which of course was contributed to heavily by Neo4j with Cypher.??Ugh ugh ugh – this is the problem with standards committees – I mean as obviously can seen from the above, one could get stuck in an infinite loop in the 2->6 and 6->2 cycle with Walk – totally defies common sense.??Well….maybe.??My first introduction to cyclical paths was implementing a job processing trace in which some jobs “looped” until a condition was met.??

As discussed in my last post, Neo4j implements “Trail”.? In a “Trail”, relationships cannot be repeated but nodes can.??So, using our example, the following is permissible traversals from 1->3 in our example:

1->2->3
1->2->6->2->3

This eliminates the infinite loop of 2->6->2->6 …..but it still allows non-Hamiltonian paths.?We’ll come back to that in a bit.

The third (and arguably the most desired) traversal is called a “Path” – perhaps short for “Hamiltonian Path”.?In a “Path”, neither nodes nor relationships are repeated, so we have our desired answer of:

Now, if we were traversing 1->6, we would have at least 3 Hamilton paths:

Just because 1->2->6 is the shortest in terms of hops, doesn’t mean it is the cheapest as the cost of the hop from 2->6 may be more expensive than the other routes.??We’ve all seen this with airline reservations in which you are often offered 2-3 layover routes that are cheaper (price wise)….but take you all day to get there.

So, why did Neo4j implement “Trail”?

Is there a way to implement “Hamiltonian Paths” with Neo4j?

For those with short attention spans, the answer to the last question is “Yes” – but we will get there in a minute.??First, with why Neo4j implemented Trail, I suspect it is because Trail represents the complete set of paths that do not involve traversing the same relationship twice.?In my last post, I used the example of:

领英推荐

The Riemann Hypothesis (by Deepseek R1)

Alexander Granderath 1 个月前

Graham’s Number Meets the Singularity: BC-MBWI and the…

Martin Ciupa 4 个月前

Nature Is a Lazy Mathematician, Part 3

Mario Schlosser 1 年前

I noted that we could then have the following paths:

A->B->C->D->E->Z
A->F->G->H->I->Z
A->J->K->L->M->Z
A->B->C->N->O->P->C->D->E->Z
A->B->C->D->E->Q->R->S->E->Z
A->B->C->N->O->P->C->D->E->Q->R->S->E->Z
A->B->G->H->I->Z
A->B->G->H->M->Z
A->F->G->H->M->Z
A->J->K->H->I->Z
A->J->K->H->M->Z

This allows us to model situations such as the job processing in which there are conditions for traversing.?

Okay, so that is great in theory – but how does one implement Hamilton Paths in Cypher.??Welllllllll….it all depends.??We could start with something like:

MATCH (source:Process {id: 1})
MATCH (target:Process {id: 3})
MATCH p=(source)-[:NEXT*]->(target)
WHERE none(x in nodes(p) WHERE exists((x)-[:NEXT]->(:Process)-[:NEXT]->(x)))
RETURN p

This uses one of the more useful functions in Neo4j called a predicate function, which I use regularly when implementing versioning.?In this case, the query translates to:

Find all the paths from the source to the target for any number of hops using the relationship types of NEXT or IS_TRANSPORTED_TO
Filter out any paths in which a node has a NEXT relationship to any other Process that also has a next relationship back to the same node

Unfortunately, this yields no results.??The problem is 2 always has the NEXT relationship to 6 and 6 to 2 since our check is not limited to just the nodes in the path.??If you want to try it for yourself, here is a quick creation script:

CREATE (n1:Process {id: 1})
CREATE (n2:Process {id: 2})
CREATE (n3:Process {id: 3})
CREATE (n4:Process {id: 4})
CREATE (n5:Process {id: 5})
CREATE (n6:Process {id: 6})
?

MERGE (n1)-[:NEXT]->(n2)
MERGE (n2)-[:NEXT]->(n6)
MERGE (n6)-[:NEXT]->(n2)
MERGE (n2)-[:NEXT]->(n3)
MERGE (n3)-[:NEXT]->(n4)
MERGE (n3)-[:NEXT]->(n5)
MERGE (n4)-[:NEXT]->(n5)
MERGE (n5)-[:NEXT]->(n6)
;

But had it worked, it would have eliminated the 2->6 and 6->2 loop as below…

…but it would not have resolved our original problem:

Knowing that Neo4j supports a shortest_path() function and knowing that the shortest path will never contain a non-Hamiltonian Path, we might be tempted to use:

MATCH (source:Process {id: 1})
MATCH (target:Process {id: 3})
MATCH p=shortestPath((source)-[:NEXT*]->(target))
RETURN p

But, if we want a list of all Hamiltonian paths, this won’t work as it only returns the shortest path (singular) wherein reality, there are 3 paths if we are instead looking at paths from 1->6 vs. just 1->3.

This is where our friend APOC comes in.??We could change the query to:

MATCH (source:Process {id: 1})
MATCH (target:Process {id: 6})
MATCH p=(source)-[:NEXT*]->(target)
WHERE NOT apoc.coll.containsDuplicates(nodes(p))
RETURN p

..and presto – it all works!!!??Gradually, the Neo4j APOC library is being incorporated into the base product and apoc.coll is one set of functions that I can hardly wait for.??Until then – all we have to do is download APOC or use APOC core from ./labs folder in installation.??

This is probably the most flexible approach and allows additional predicates including (if necessary) predicate functions on the relationships for versioning.??However, it likely isn’t the fastest approach.??A slightly faster approach, since we are using APOC, is to simply use the APOC path utilities ala:

MATCH (source:Process {id: 1})
MATCH (target:Process {id: 3})
CALL apoc.path.expandConfig(source, {relationshipFilter: "NEXT>"
??????????????? , uniqueness: "NODE_PATH"
??????????????? , terminatorNodes: target
??????????????? }) yield path
RETURN path

On our simple example, it might not mean much, but on highly complicated graphs, the performance difference could be extremely substantial.

However, the absolute fastest approach would be to use Neo4j’s GDS library and specifically the shortest paths functions.??While Neo4j does have a “shortest_path()” function, it doesn’t support weighted paths/path cost and other aspects that are very common in such problems.??Using GDS, the code would be:

call gds.graph.project('blog','Process',
??? {
??? NEXT: {
??????? type: 'NEXT',
??????? orientation: 'NATURAL'
??? }}
)
;
?

MATCH (source:Process {id: 1})
MATCH (target:Process {id: 6})
CALL gds.shortestPath.yens.stream('blog',
{
??????? sourceNode: source,
??????? targetNode: target,
??????? k: 3
})YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
RETURN index, path
;

…which yields the following:

Voila – and no non-Hamiltonian Paths….and typically 10x faster than APOC on larger networks.

要查看或添加评论，请登录

Jeff Tallman的更多文章

Time Series Vectors in Neo4j

2024年6月19日

Time Series Vectors in Neo4j

What really is time series data? When people of time series data, the context is not always the same. In some cases, it…

2 条评论
Hands on with GenAI & Neo4j Apr 25th in NYC @ AWS

2024年4月4日

Hands on with GenAI & Neo4j Apr 25th in NYC @ AWS

For those of you near to the "Big Apple" and willing to brave the arduous trek into the city..

3 条评论
The Challenges of Virtual Knowledge Graphs

2023年12月21日

The Challenges of Virtual Knowledge Graphs

Nothing is new under the sun. The idea of virtual access to data in remote (and often heterogeneous) data stores is…
Vector Indexing plus Knowledge Graphs with Neo4j

2023年11月21日

Vector Indexing plus Knowledge Graphs with Neo4j

TL;DR: Normally vector indexing is thought of as a common implementation for Generative AI. While this is true at a…

4 条评论
Modeling Longitudinal/Time Series/Sequential Data in Neo4j

2023年7月31日

Modeling Longitudinal/Time Series/Sequential Data in Neo4j

…and Why Analyzing It Is a Graph Problem One of the most common modeling questions we get asked is how to model time…
Modeling Discrete Relationships in Neo4j

2023年7月13日

Modeling Discrete Relationships in Neo4j

Graph Modeling One of the comments on my earlier blog was about graph modeling – it is an area that has tremendous…

1 条评论
Graph Feature Engineering for Longitudinal Data (aka Time Series)

2023年6月24日

Graph Feature Engineering for Longitudinal Data (aka Time Series)

TL;DR: Graph feature engineering in Neo4j for longitudinal (aka time-series) data allows path analysis and other…

2 条评论
DBMS for Data Science: Why Neo4j vs. your tRusty ol’ RDBMS

2023年6月12日

DBMS for Data Science: Why Neo4j vs. your tRusty ol’ RDBMS

A prospective customer asked me a question the other day – why use Neo4j for data science instead of my [existing]…

5 条评论
Buzzword Salad: Ontologies, Digital Twins, Knowledge Graphs….

2022年12月15日

Buzzword Salad: Ontologies, Digital Twins, Knowledge Graphs….

TL;DR: Why RDF databases don’t scale well for Knowledge Graphs Let’s face it, technologists are the worst purveyors of…

4 条评论
Friend of a friend (aka Relationship Isomorphism …..and how to saturate a CPU in spite of it)

2022年4月4日

Friend of a friend (aka Relationship Isomorphism …..and how to saturate a CPU in spite of it)

If you are a student of DBMS technologies, then you know that there are four basic join types: Nested Loop (NLJ) – Uses…

1 条评论

See all articles

Hamiltonian Paths

Jeff Tallman

Principal Sales Engineer at Neo4j

领英推荐

Jeff Tallman的更多文章

社区洞察

其他会员也浏览了

How I use Pi, the most important number in the universe, every day

Gentle Introduction to Linear Algebra, with Spectacular Applications

Mandelbrot Set- the ultimate shape of math and nature

Fundamentals of Quantization - Quantization of LLMs, Part-3

Variography: Post II - The Nugget Effect

Behold: The Thompson Postulate

Timeline of computational mathematics

Riemannian Geometry Cheat sheet

What are the complexities of an algorithm, and what are some examples of cases?

领英推荐

Jeff Tallman的更多文章

Time Series Vectors in Neo4j

Hands on with GenAI & Neo4j Apr 25th in NYC @ AWS

The Challenges of Virtual Knowledge Graphs

Vector Indexing plus Knowledge Graphs with Neo4j

Modeling Longitudinal/Time Series/Sequential Data in Neo4j

Modeling Discrete Relationships in Neo4j

Graph Feature Engineering for Longitudinal Data (aka Time Series)

DBMS for Data Science: Why Neo4j vs. your tRusty ol’ RDBMS

Buzzword Salad: Ontologies, Digital Twins, Knowledge Graphs….

Friend of a friend (aka Relationship Isomorphism …..and how to saturate a CPU in spite of it)

社区洞察

其他会员也浏览了

How I use Pi, the most important number in the universe, every day

Gentle Introduction to Linear Algebra, with Spectacular Applications

Mandelbrot Set- the ultimate shape of math and nature

Fundamentals of Quantization - Quantization of LLMs, Part-3

Variography: Post II - The Nugget Effect

Behold: The Thompson Postulate

Timeline of computational mathematics

Riemannian Geometry Cheat sheet

What are the complexities of an algorithm, and what are some examples of cases?