登录查看更多内容

Sydney Trains - Most Critical Station Connection (Part 1)

Dougy Lee

Android Developer

发布日期: 2025年2月18日

+ 关注

Question

If we could connect any two #sydneytrains stations together, which two would be best?

Context

Yesterday, I calculated the importance of the station within the network using a metric called the Betweenness Centrality (BC) score. It is a measure of how important each train station is in the network. The score can tell us how badly the system would suffer if that station is down for some reason, whether it be a maintenance issue or electrical vault or a medical emergency. The higher the BC score, the more disruption it would cause. My biggest take away was identifying three train stations that were highly central to the network: Central, Chatswood and Sydenham.

The Plan

The question I asked is looking at what is the best link I could build between two stations that would reduce the centrality of the entire system.

This is my plan of attack:

Create a baseline of the BC scores of the existing network
Generate candidate train lines linking stations that don’t yet have a connection.
Recalculate the BC scores with the new connection
Compare against the baseline and identify which one is best.

Achievement

Today's exercise was to get more practice using the Python Driver to interact with the knowledge graph.

Here's what I achieved:

I connected to the graph.
I created a new projection.
I ran a graph algorithm on the projection
I queried the projection
I transformed the results into a Pandas Series for later processing

The Baseline

Yesterday, I ported all the train stations as nodes on a Neo4J knowledge graph and calculated the between the centrality score for each of them. Now that I have this data, I can use it as a baseline.

Scripting

Now I have an idea on where to start, I open up my trusty PyCharm.

I’ll need to interact with the Neo 4J database using the Python driver.

The package that I used was the neo4J library.

Install that and import the graph database driver.

from neo4j import GraphDatabase

Using this driver, I connect to the database using the credentials that I've set up for my Neo4J instance.

NEO4J_URI = "bolt://localhost:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "password"
NEO4J_DATABASE = "trains"

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database=NEO4J_DATABASE)

Now we run the driver to execute a query.

In this particular query, I create a projection for the baseline.

query = f"""
        MATCH (s1:Stop)-[r:NEXT_STOP]->(s2:Stop)
        WITH gds.graph.project('{projection_name}', s1, s2) AS g
        RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS relationships
    """
session.run(query)

I then use this projection to calculate the baseline BC scores.

# Calculate the betweenness centrality on the baseline projection
query = f"""
    CALL gds.betweenness.stream('{projection_name}')
    YIELD nodeId, score
    RETURN gds.util.asNode(nodeId).name AS station, score
"""
result = session.run(query)
baseline = pd.Series({record['station']: record['score'] for record in result})

A little bit of forward thinking. Since I'll be using the pandas library to make statistical calculations on these scores, I would need to create a data frame or series and return it as part of my function.

Gotchas

Here are some of the gotchas that I ran into whilst developing the script.

It seems like there is a deprecated function to create cypher projections.

The new function to call it is: gds.graph.project

The documentation of it is here:

https://neo4j.com/docs/graph-data-science/current/management-ops/graph-creation/graph-project-cypher-projection/

The queries create an error whenever you want to create a projection with the same name as one that exists.

An error also occurs when you try to drop a projection that does not exist.

This means that when I create or drop the projection, I would need to wrap it up in an existence check.

CALL gds.graph.exists('{projection_name}') YIELD exists
WHERE exists
CALL gds.graph.drop('{projection_name}') YIELD graphName
RETURN graphName;

The Code

You can see all my code in the Github Gist here:

https://gist.github.com/Taresin/2f150e4daa652d0eb5d58f15727f3df6

要查看或添加评论，请登录

Dougy Lee的更多文章

Sydney Trains - Most critical station connection (Part 3)

2025年3月5日

Sydney Trains - Most critical station connection (Part 3)

The Question: If we could connect any two #sydneytrains stations together, which two would be best? This was my plan of…
Sydney Trains - Most critical station connection (Part 2)

2025年2月26日

Sydney Trains - Most critical station connection (Part 2)

The Question: If we could connect any two #sydneytrains stations together, which two would be best? This is my plan of…
Sydney Trains - Which stations are the most critical?

2025年2月11日

Sydney Trains - Which stations are the most critical?

Today, I calculated the importance of each station on the #sydneytrains network using #neo4j. Betweenness Centrality…
Google Hacking: How hackers pan for Cyber Gold in Directory Listings

2023年1月14日

Google Hacking: How hackers pan for Cyber Gold in Directory Listings

TL;DR; The directory listing type of web page is sometimes an unintended result of a badly configured web server. All…
Google Hacking: How to surf a website without leaving any logs

2023年1月9日

Google Hacking: How to surf a website without leaving any logs

TL;DR; Using Google cache can mask your browsing of websites. Might not be available for specific sites but a good tool…

1 条评论
My first steps into learning Cloud Security

2022年12月25日

My first steps into learning Cloud Security

Intro As a developer, it will be relatively small amount of time before you interact with cloud technology. This could…

See all articles

Question

Achievement

The Baseline

Scripting

Gotchas

The Code

Dougy Lee的更多文章

Sydney Trains - Most critical station connection (Part 3)

Sydney Trains - Most critical station connection (Part 2)

Sydney Trains - Which stations are the most critical?

Google Hacking: How hackers pan for Cyber Gold in Directory Listings

Google Hacking: How to surf a website without leaving any logs

My first steps into learning Cloud Security