Hamiltonian Paths
Last time I posted – admittedly a while ago…..but as much necessity is the mother of invention, work is often the frustration of expressing one’s thoughts on paper….the post was on relationship isomorphism with the title “Friend of a friend”
Fast forward a few months and a colleague came to me with a problem.??How to prevent non-Hamiltonian paths in Neo4j.??Thankfully, he included the Wikipedia link so I could translate his request into common English.?
To speed things along, generally, it states:
In the?mathematical?field of?graph theory, a?Hamiltonian path?(or?traceable path) is a?path?in an undirected or directed graph that visits each?vertex?exactly once. A?Hamiltonian cycle?(or?Hamiltonian circuit) is a?cycle?that visits each vertex exactly once. A Hamiltonian path that starts and ends at adjacent vertices can be completed by adding one more edge to form a Hamiltonian cycle, and removing any edge from a Hamiltonian cycle produces a Hamiltonian path. Determining whether such paths and cycles exist in graphs (the?Hamiltonian path problem?and?Hamiltonian cycle problem) are?NP-complete.
Hamiltonian paths and cycles are named after?William Rowan Hamilton?who invented the?icosian game, now also known as?Hamilton's puzzle, which involves finding a Hamiltonian cycle in the edge graph of the?dodecahedron. Hamilton solved this problem using the?icosian calculus, an?algebraic structure?based on?roots of unity?with many similarities to the?quaternions?(also invented by Hamilton). This solution does not generalize to arbitrary graphs.
Okay – so Hamilton musta been really bored and lived a really long time ago.??I mean, what’s wrong with good old fashion chess game that you have to come up with a game about geometry?
But I digress.??The problem as my colleague postulated, concerned something that looked like the following, but a tad simpler
In simple terms, he wants to traverse from 1 to 6 or 2 to 6 without doing the loop of 2->3->4 as node #2 gets traversed twice which is a non-Hamilton Path.?
Most graph DBMS’s subscribe to graph theory in which a generic path can have one of three core traversal types.?Consider the below as a common example I sometimes use when introducing Neo4j to people as the Friend of a Friend type query performance issue is a common problem:
Assume all the edges are one direction pointing to the higher node number except the second relationship of 6->2.??We’ll get to that.
The first and most common graph traversal technique is called “Walk”.??Maybe it should be called “Wander” instead.?In a “Walk”, any node or relationship can be repeated in a path.??So, to traverse from 1->3 the following paths are legal, assuming directionality in the traversal:
1->2->3
1->2->6->2->3
1->2->6->2->6->2->3
....
1->2->3->5->6->2->3
1->2->3->4->5->6->2->3
....
If we ignore direction, there are a lot more possibilities.?However, the path 1->2->6->2->3 (and many others) is non-Hamilton Path.?Note that although node #3 is the destination, it can be traversed through on a longer path in theory.??But let’s ignore that an assume the DBMS is smart enough to terminate the path at the target the first time. Non-Hamilton paths right from the start are never in the shortest path – but they are extremely common and lead to extensive path explosion.?Unfortunately, the GQL standard seems to suggest that “Walk” has to be supported at a minimum.?For those new to Graph DBMS, GQL is the ISO standard query language for Graph DBMS – similar to SQL as the ISO standard for RDBMS. ?GQL is based extensively on OpenCypher a common language supported by 20+ Graph DBMS’s, which of course was contributed to heavily by Neo4j with Cypher.??Ugh ugh ugh – this is the problem with standards committees – I mean as obviously can seen from the above, one could get stuck in an infinite loop in the 2->6 and 6->2 cycle with Walk – totally defies common sense.??Well….maybe.??My first introduction to cyclical paths was implementing a job processing trace in which some jobs “looped” until a condition was met.??
As discussed in my last post, Neo4j implements “Trail”.? In a “Trail”, relationships cannot be repeated but nodes can.??So, using our example, the following is permissible traversals from 1->3 in our example:
1->2->3
1->2->6->2->3
This eliminates the infinite loop of 2->6->2->6 …..but it still allows non-Hamiltonian paths.?We’ll come back to that in a bit.
The third (and arguably the most desired) traversal is called a “Path” – perhaps short for “Hamiltonian Path”.?In a “Path”, neither nodes nor relationships are repeated, so we have our desired answer of:
Now, if we were traversing 1->6, we would have at least 3 Hamilton paths:
Just because 1->2->6 is the shortest in terms of hops, doesn’t mean it is the cheapest as the cost of the hop from 2->6 may be more expensive than the other routes.??We’ve all seen this with airline reservations in which you are often offered 2-3 layover routes that are cheaper (price wise)….but take you all day to get there.
So, why did Neo4j implement “Trail”?
Is there a way to implement “Hamiltonian Paths” with Neo4j?
For those with short attention spans, the answer to the last question is “Yes” – but we will get there in a minute.??First, with why Neo4j implemented Trail, I suspect it is because Trail represents the complete set of paths that do not involve traversing the same relationship twice.?In my last post, I used the example of:
领英推荐
I noted that we could then have the following paths:
A->B->C->D->E->Z
A->F->G->H->I->Z
A->J->K->L->M->Z
A->B->C->N->O->P->C->D->E->Z
A->B->C->D->E->Q->R->S->E->Z
A->B->C->N->O->P->C->D->E->Q->R->S->E->Z
A->B->G->H->I->Z
A->B->G->H->M->Z
A->F->G->H->M->Z
A->J->K->H->I->Z
A->J->K->H->M->Z
This allows us to model situations such as the job processing in which there are conditions for traversing.?
Okay, so that is great in theory – but how does one implement Hamilton Paths in Cypher.??Welllllllll….it all depends.??We could start with something like:
MATCH (source:Process {id: 1})
MATCH (target:Process {id: 3})
MATCH p=(source)-[:NEXT*]->(target)
WHERE none(x in nodes(p) WHERE exists((x)-[:NEXT]->(:Process)-[:NEXT]->(x)))
RETURN p
This uses one of the more useful functions in Neo4j called a predicate function, which I use regularly when implementing versioning.?In this case, the query translates to:
Unfortunately, this yields no results.??The problem is 2 always has the NEXT relationship to 6 and 6 to 2 since our check is not limited to just the nodes in the path.??If you want to try it for yourself, here is a quick creation script:
CREATE (n1:Process {id: 1})
CREATE (n2:Process {id: 2})
CREATE (n3:Process {id: 3})
CREATE (n4:Process {id: 4})
CREATE (n5:Process {id: 5})
CREATE (n6:Process {id: 6})
?
MERGE (n1)-[:NEXT]->(n2)
MERGE (n2)-[:NEXT]->(n6)
MERGE (n6)-[:NEXT]->(n2)
MERGE (n2)-[:NEXT]->(n3)
MERGE (n3)-[:NEXT]->(n4)
MERGE (n3)-[:NEXT]->(n5)
MERGE (n4)-[:NEXT]->(n5)
MERGE (n5)-[:NEXT]->(n6)
;
But had it worked, it would have eliminated the 2->6 and 6->2 loop as below…
…but it would not have resolved our original problem:
Knowing that Neo4j supports a shortest_path() function and knowing that the shortest path will never contain a non-Hamiltonian Path, we might be tempted to use:
MATCH (source:Process {id: 1})
MATCH (target:Process {id: 3})
MATCH p=shortestPath((source)-[:NEXT*]->(target))
RETURN p
But, if we want a list of all Hamiltonian paths, this won’t work as it only returns the shortest path (singular) wherein reality, there are 3 paths if we are instead looking at paths from 1->6 vs. just 1->3.
This is where our friend APOC comes in.??We could change the query to:
MATCH (source:Process {id: 1})
MATCH (target:Process {id: 6})
MATCH p=(source)-[:NEXT*]->(target)
WHERE NOT apoc.coll.containsDuplicates(nodes(p))
RETURN p
..and presto – it all works!!!??Gradually, the Neo4j APOC library is being incorporated into the base product and apoc.coll is one set of functions that I can hardly wait for.??Until then – all we have to do is download APOC or use APOC core from ./labs folder in installation.??
This is probably the most flexible approach and allows additional predicates including (if necessary) predicate functions on the relationships for versioning.??However, it likely isn’t the fastest approach.??A slightly faster approach, since we are using APOC, is to simply use the APOC path utilities ala:
MATCH (source:Process {id: 1})
MATCH (target:Process {id: 3})
CALL apoc.path.expandConfig(source, {relationshipFilter: "NEXT>"
??????????????? , uniqueness: "NODE_PATH"
??????????????? , terminatorNodes: target
??????????????? }) yield path
RETURN path
On our simple example, it might not mean much, but on highly complicated graphs, the performance difference could be extremely substantial.
However, the absolute fastest approach would be to use Neo4j’s GDS library and specifically the shortest paths functions.??While Neo4j does have a “shortest_path()” function, it doesn’t support weighted paths/path cost and other aspects that are very common in such problems.??Using GDS, the code would be:
call gds.graph.project('blog','Process',
??? {
??? NEXT: {
??????? type: 'NEXT',
??????? orientation: 'NATURAL'
??? }}
)
;
?
MATCH (source:Process {id: 1})
MATCH (target:Process {id: 6})
CALL gds.shortestPath.yens.stream('blog',
{
??????? sourceNode: source,
??????? targetNode: target,
??????? k: 3
})YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
RETURN index, path
;
…which yields the following:
Voila – and no non-Hamiltonian Paths….and typically 10x faster than APOC on larger networks.