Sydney Trains - Most critical station connection (Part 2)
The Question:
If we could connect any two #sydneytrains stations together, which two would be best?
This is my plan of attack:
Recap:
Candidates
I used Neo4J's query language #cypher in identifying unconnected stations.
So how many candidate pairs are there?
Let's do some math to have a sanity check on the numbers.
The train network has 177 stations. To calculate the number of possible pairs in the network, we can use the “choose” function we learned in high school maths when dealing with combinations (nCr).
177 C 2 = 177 * 176 / 2 = 15,576
If we remove the existing connections (191), then we have:
15,576 - 191 = 15,385
possible candidate pairs to explore.
That's a lot!
Scripting
I continue to build on my script.
I created a function that looks through the knowledge graph for any stations that are not yet connected.
I built up this query by searching for pairs of stops.
MATCH (s1:Stop), (s2:Stop)
RETURN s1.name AS station1, s2.name AS station2
This gives me the Cartesian product of all the stops.
The problem with this statement is that it matches all stops that still have a connection.
We would then have to filter out all the pairs that already have a connection.
We do this by adding a WHERE clause.
MATCH (s1:Stop), (s2:Stop)
WHERE NOT (s1)-[:NEXT_STOP]-(s2)
RETURN s1.name AS station1, s2.name AS station2
Because relationships in the knowledge graphs are directed, we need to filter out all the pairs that have this NEXT_STOP relationship in both direction. So we leave the arrow character out of the relationship syntax to check for a match in either direction.
The last thing we need to do is to filter out all the duplicate pairs because we don't need to consider the reverse direction. We can do this by forcing the pairs to be in alphabetical order. In the same stroke, eliminate the pairs with two of the same stop since the same name cannot be before itself.
MATCH (s1:Stop), (s2:Stop)
WHERE s1.name < s2.name AND NOT (s1)-[:NEXT_STOP]-(s2)
RETURN s1.name AS station1, s2.name AS station2
Now that I have the query finished I can add that to the script.
I then transform the results of the query into a list of tuples.
def get_candidate_pairs(session):
query = """
MATCH (s1:Stop), (s2:Stop)
WHERE s1.name < s2.name AND NOT (s1)-[:NEXT_STOP]-(s2)
RETURN s1.name AS station1, s2.name AS station2
"""
result = session.run(query)
candidates = [(record["station1"], record["station2"]) for record in result]
return candidates