Sydney Trains - Which stations are the most critical?

Sydney Trains - Which stations are the most critical?

Today, I calculated the importance of each station on the #sydneytrains network using #neo4j.


Betweenness Centrality

"Betweenness centrality" is a graph theory attribute that measures how often a train station appears on the shortest path between any two trips.

I'm still pretty new to it but I think of it like this: it tells you how many people it would affect if there was an issue on train station. For example, an issue at Cronulla would impact everyone going to and from Cronulla, but that's the extent of it. In contrast, an issue at Wynyard would have a lot more impact since there are a lot more services going through it. The whole North Shore line needs to connect to it via Milson's Point.


Neo4J has a data science library that can perform these graph properties for each node. I built on yesterday's work where I reproduced the train station network as a Knowledge Graph. The extra addition here was to calculate the betweenness centrality property. The code is very straightforward and easy to execute.

CALL gds.graph.project.cypher(
    'projection_trains',
    'MATCH (s:Stop) RETURN id(s) AS id',
    'MATCH (s:Stop)-[:NEXT_STOP]->(t:Stop) RETURN id(s) AS source, id(t) AS target, "NEXT_STOP" AS type'
);

CALL gds.betweenness.write('projection_trains', {
  writeProperty: 'betweennessCentrality'
});        

Neo4J Bloom

I spent some time getting familiar with Neo4J Bloom which has the feature to set node color and size based on properties. The node sizes of my chart are based on the values of the betweenness property.



They seem to range from 0 to 2800. I added an image of the histogram to show you the spread of values.


Most stations had a low betweenness value, but the highest 3 seem to be outliers. I colored them in red and they are Central, Chatswood and Sydenham. This makes sense since, all lines pass through Central station. Chatswood is the gateway to connect the Northern line and the North Shore line. And Sydenham spans the Metro line and the South line.


Assumptions

This is all based on the assumption that all stations are equal in terms of numbers of passengers entering the system and based purely on their position on the network.


To Explore

Two factors that would be interesting are:

  • How does the value change based on population density of the local areas?
  • How does the value change over time? (I imagine that peak loads have a different profile since there are much more people traveling to and from work on the weekdays)

要查看或添加评论,请登录

Dougy Lee的更多文章

社区洞察

其他会员也浏览了