Sydney Trains - Most Critical Station Connection (Part 1)

Sydney Trains - Most Critical Station Connection (Part 1)

Question

If we could connect any two #sydneytrains stations together, which two would be best?


Context

Yesterday, I calculated the importance of the station within the network using a metric called the Betweenness Centrality (BC) score. It is a measure of how important each train station is in the network. The score can tell us how badly the system would suffer if that station is down for some reason, whether it be a maintenance issue or electrical vault or a medical emergency. The higher the BC score, the more disruption it would cause. My biggest take away was identifying three train stations that were highly central to the network: Central, Chatswood and Sydenham.


The Plan

The question I asked is looking at what is the best link I could build between two stations that would reduce the centrality of the entire system.

This is my plan of attack:

  • Create a baseline of the BC scores of the existing network
  • Generate candidate train lines linking stations that don’t yet have a connection.
  • Recalculate the BC scores with the new connection
  • Compare against the baseline and identify which one is best.


Achievement

Today's exercise was to get more practice using the Python Driver to interact with the knowledge graph.

Here's what I achieved:

  1. I connected to the graph.
  2. I created a new projection.
  3. I ran a graph algorithm on the projection
  4. I queried the projection
  5. I transformed the results into a Pandas Series for later processing


The Baseline

Yesterday, I ported all the train stations as nodes on a Neo4J knowledge graph and calculated the between the centrality score for each of them. Now that I have this data, I can use it as a baseline.


Scripting

Now I have an idea on where to start, I open up my trusty PyCharm.

I’ll need to interact with the Neo 4J database using the Python driver.

The package that I used was the neo4J library.

Install that and import the graph database driver.

from neo4j import GraphDatabase        

Using this driver, I connect to the database using the credentials that I've set up for my Neo4J instance.

NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "password"
NEO4J_DATABASE = "trains"

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database=NEO4J_DATABASE)        

Now we run the driver to execute a query.

In this particular query, I create a projection for the baseline.

query = f"""
        MATCH (s1:Stop)-[r:NEXT_STOP]->(s2:Stop)
        WITH gds.graph.project('{projection_name}', s1, s2) AS g
        RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS relationships
    """
session.run(query)        

I then use this projection to calculate the baseline BC scores.

# Calculate the betweenness centrality on the baseline projection
query = f"""
    CALL gds.betweenness.stream('{projection_name}')
    YIELD nodeId, score
    RETURN gds.util.asNode(nodeId).name AS station, score
"""
result = session.run(query)
baseline = pd.Series({record['station']: record['score'] for record in result})        

A little bit of forward thinking. Since I'll be using the pandas library to make statistical calculations on these scores, I would need to create a data frame or series and return it as part of my function.


Gotchas

Here are some of the gotchas that I ran into whilst developing the script.

It seems like there is a deprecated function to create cypher projections.

The new function to call it is: gds.graph.project

The documentation of it is here:

https://neo4j.com/docs/graph-data-science/current/management-ops/graph-creation/graph-project-cypher-projection/

The queries create an error whenever you want to create a projection with the same name as one that exists.

An error also occurs when you try to drop a projection that does not exist.

This means that when I create or drop the projection, I would need to wrap it up in an existence check.

CALL gds.graph.exists('{projection_name}') YIELD exists
WHERE exists
CALL gds.graph.drop('{projection_name}') YIELD graphName
RETURN graphName;        



The Code

You can see all my code in the Github Gist here:

https://gist.github.com/Taresin/2f150e4daa652d0eb5d58f15727f3df6

要查看或添加评论,请登录

Dougy Lee的更多文章