登录查看更多内容

Sydney Trains - Most critical station connection (Part 2)

Dougy Lee

Android Developer

发布日期: 2025年2月26日

+ 关注

The Question:

If we could connect any two #sydneytrains stations together, which two would be best?

This is my plan of attack:

Create a baseline of the BC scores of the existing network
Generate candidate train lines linking stations that don’t yet have a connection.
Recalculate the BC scores with the new connection
Compare against the baseline and identify which one is best.

Recap:

Set up my python script to interact with the Neo4J Python driver.
Learnt how to use the GDS functions to create projections.
Determined a format to store the BC scores as a Pandas Series.

Candidates

I used Neo4J's query language #cypher in identifying unconnected stations.

So how many candidate pairs are there?

Let's do some math to have a sanity check on the numbers.

The train network has 177 stations. To calculate the number of possible pairs in the network, we can use the “choose” function we learned in high school maths when dealing with combinations (nCr).

177 C 2 = 177 * 176 / 2 = 15,576

If we remove the existing connections (191), then we have:

15,576 - 191 = 15,385

possible candidate pairs to explore.

That's a lot!

Scripting

I continue to build on my script.

I created a function that looks through the knowledge graph for any stations that are not yet connected.

I built up this query by searching for pairs of stops.

MATCH (s1:Stop), (s2:Stop)
RETURN s1.name AS station1, s2.name AS station2

This gives me the Cartesian product of all the stops.

The problem with this statement is that it matches all stops that still have a connection.

We would then have to filter out all the pairs that already have a connection.

We do this by adding a WHERE clause.

MATCH (s1:Stop), (s2:Stop)
WHERE NOT (s1)-[:NEXT_STOP]-(s2)
RETURN s1.name AS station1, s2.name AS station2

Because relationships in the knowledge graphs are directed, we need to filter out all the pairs that have this NEXT_STOP relationship in both direction. So we leave the arrow character out of the relationship syntax to check for a match in either direction.

The last thing we need to do is to filter out all the duplicate pairs because we don't need to consider the reverse direction. We can do this by forcing the pairs to be in alphabetical order. In the same stroke, eliminate the pairs with two of the same stop since the same name cannot be before itself.

MATCH (s1:Stop), (s2:Stop)
WHERE s1.name < s2.name AND NOT (s1)-[:NEXT_STOP]-(s2)
RETURN s1.name AS station1, s2.name AS station2

Now that I have the query finished I can add that to the script.

I then transform the results of the query into a list of tuples.

def get_candidate_pairs(session):

    query = """
        MATCH (s1:Stop), (s2:Stop)
        WHERE s1.name < s2.name AND NOT (s1)-[:NEXT_STOP]-(s2)
        RETURN s1.name AS station1, s2.name AS station2
    """
    result = session.run(query)
    candidates = [(record["station1"], record["station2"]) for record in result]
    return candidates

要查看或添加评论，请登录

Dougy Lee的更多文章

Sydney Trains - Most critical station connection (Part 3)

2025年3月5日

Sydney Trains - Most critical station connection (Part 3)

The Question: If we could connect any two #sydneytrains stations together, which two would be best? This was my plan of…
Sydney Trains - Most Critical Station Connection (Part 1)

2025年2月18日

Sydney Trains - Most Critical Station Connection (Part 1)

Question If we could connect any two #sydneytrains stations together, which two would be best? Context Yesterday, I…
Sydney Trains - Which stations are the most critical?

2025年2月11日

Sydney Trains - Which stations are the most critical?

Today, I calculated the importance of each station on the #sydneytrains network using #neo4j. Betweenness Centrality…
Google Hacking: How hackers pan for Cyber Gold in Directory Listings

2023年1月14日

Google Hacking: How hackers pan for Cyber Gold in Directory Listings

TL;DR; The directory listing type of web page is sometimes an unintended result of a badly configured web server. All…
Google Hacking: How to surf a website without leaving any logs

2023年1月9日

Google Hacking: How to surf a website without leaving any logs

TL;DR; Using Google cache can mask your browsing of websites. Might not be available for specific sites but a good tool…

1 条评论
My first steps into learning Cloud Security

2022年12月25日

My first steps into learning Cloud Security

Intro As a developer, it will be relatively small amount of time before you interact with cloud technology. This could…

See all articles

The Question:

This is my plan of attack:

Recap:

Candidates

Scripting

Dougy Lee的更多文章

Sydney Trains - Most critical station connection (Part 3)

Sydney Trains - Most Critical Station Connection (Part 1)

Sydney Trains - Which stations are the most critical?

Google Hacking: How hackers pan for Cyber Gold in Directory Listings

Google Hacking: How to surf a website without leaving any logs

My first steps into learning Cloud Security