City-Brain-Challenge
Traffic congestion is one of the most significant problems in large cities, especially during peak hours. According to the Global Traffic Scoreboard (2019) published by the data analytics company INRIX, Americans lose on average 99 hours a year to traffic, equivalent to a loss of $1,377 per driver.?
As an academic project, our team participated in one of the annual Data Mining and Knowledge Discovery competitions, namely the City Brain Challenge, which targets exactly this matter. With smart traffic light management using Reinforcement Learning (RL), such congestion can be reduced.
Given a four-street intersection with three lanes each, a car queues in a lane to turn right, left or drive straight forward accordingly.
One traffic light represents one agent. It observes the environment (intersection, cars, information of neighbor agents) by taking the data as an input. The agent then chooses one of the eight traffic signal phases as an action (output) for vehicles to move. The environment changes, for instance that vehicle queues are reduced or new vehicles arrive. The agent takes the new environment as input again to proceed with further actions.
Below, a simulation of an intersection is shown. In an intersection with North, East (1), South (2), West (3) direction, the moving dots represent the vehicles and the switching number in the center is equivalent to the traffic signal phase (1-8).
Zooming out of the visualization, several intersections and thus the vehicle flow can be seen in the simulation.
As this project dealed with real world problems, data on large scale was provided. The given data set contained all the necessary information such as road net and vehicle flow to realize these simulations of real city maps with up to 92.000 intersections. The data was then taken by the agent to calculate the best choice of a certain traffic signal phase as action in each step.
Algorithms and Reward Function
The reward function is essential for the agent in order to decide which action to take best in a certain situation. By maximizing the reward, it chooses the action that provides the most profit. In this project, the multi-agent approaches Presslight, Colight and QMIX were implemented.
Presslight
Presslight utilizes pressure as a reward function which is defined by the difference between incoming and outgoing vehicles on a road. In the example Case A below, there are four vehicles queueing and one vehicle going out of the road, hence the pressure is three.
A higher pressure will yield a higher reward. The goal of the agent is therefore to maximize the pressure. A?DQN, or Deep Q-Network, to approximate a state-value function in a Q-Learning framework with a neural?network was added to the architecture in our project.
QMIX
In many settings, agents use local information of neighbor agents and hence act in a decentralized manner. QMIX allows agents to use global state information enabling centralized learning while using decentralized policies and therefore act in a decentralized manner.
Colight
Colight uses graph attention networks (GAT), a neural network architecture that operates on?graph-structured data. By stacking layers in which nodes are able to attend over their neighborhoods’ features, a GAT enables (implicitly) specifying different weights to different nodes in a neighborhood, enhancing communication between adjacent agents.
The agents learn the traffic trend and are thus able to distinguish between side streets and main roads.
Due to the GAT architecture, the importance of adjacent neighbors gradually decreases the further the distance.
领英推荐
Results
In the qualification round, we achieved a ranking of #112 out of more than 1000 participating teams. The top 20 teams were qualified for the final round.
The submission of the algorithms were evaluated by the total vehicles served, meaning how many vehicles reached their end destination and the trip delay, the ratio of the actual travel time and the travel time with free flow speed.
Here were the top three rankings of the best teams of the final round:
Our team achieved the best results with Presslight, serving in total 70.026 vehicles with a delay of 1.65.
The competition provided three data sets with 36, 2048 and 92.344 intersections for training and simulation. Due to time constraints, limited resources and complexity of the calculation, the data set with only 36 intersection were used for QMIX and Colight training .
Here are the training results comparing all three approaches with the same conditions in order to draw the conclusion, which algorithm performed best regarding calculation time, resources and results. The x-axis in the plot represents the negative number of spawning cars not reaching their destination yet while in the y-axis the number of iterations are shown.
Presslight learned very quickly and reached the optimum after approximately 1.000 iterations. The constant trend around zero implies that all new spawned vehicles are served.
Colight has found its (local) optimum around minus 20.000, meaning that in each iteration 20.000 vehicles are spawned than reaching their destination in the simulation. Its meager quantity of iterations needs to be addressed however, as Colight required tremendous training time due to its complex calculations. Whereas for each episode it took few seconds for Presslight, it took more than 20 minutes for Colight, equivalent to a factor of ~ 1000.
The training of QMIX as well reached 600 iterations only. In this plot, a slight upward trend is seen and due to time restrictions training could not be continued.
In optimal condition, additional time would have benefit both trainings of Colight and QMIX to investigate the learning behavior. For now, we conclude that simpler algorithms like Presslight seems to be the most efficient regarding time while giving the best results in comparison to the more sophisticated ones.
Further Notes
As sophisticated and complex approaches have been implemented, a tremendous amount of patience, discipline and perseverance were required. A harmonic team dynamic and strong communication were essential for progress in this project. These were fully present with Max Gawlick and Faheem Zunjani. Very special thanks to both of you, this project would be half as much fun if it wasn't for you guys!
Also, I would like to mention the "Lehrstuhl für Datenbanksysteme und Data Mining" and Prof. Dr. Thomas Seidl allowing us to work on such an interesting topic in the practical course "Project Big Data Science" during the semester. Big thanks to the supervisors for and the overall organization and providing us the necessary resources.
Feel free to visit our GitHub repository for the implementations, training and additional evaluation results.
Further details about the Challenge documentation:?KDDCup 2021 City-Brain-Challenge.