Analyzing Data to Support Traffic Management
It is always important to analyze trajectory data to support traffic management problems. I am writing this article to show "How data mining techniques can be used for trajectory analysis to provide a clear insight into population movement?". I used R to analyze the dataset of trips that had done in a city. The analyzed dataset is in a TXT file format with a raw size of 1.5 GB. This dataset contains all trip information that has been completed for several months. Initially, I have done the spatial-temporal analysis and visualized information. As you know, the spatial-temporal will be used in data analysis when data is collected across both space and time. Indeed, it describes a phenomenon in a certain location and time. In the first step, the trips' volume is discovered in weekdays and weekends.
In addition, I found the volume of trips that took place in different months.
Each trip consists of pick-ups and drop-offs. "pick-up" means to go to a place where someone is staying and take them home with you and "drop-off" is a location where our drivers drop off a load of boxes and leave them there for the day, the frequency of pick-ups and drop-offs within week is also discovered as below.
Moreover, I have obtained the distributions of pick-ups and drop-o?s in the city. As can be seen, the green points present the drop-o?s and the purple points present the pick-ups. The pick-ups are mostly over?lled in the center of the city and drop-o?s are spread toward the outskirts of the city.
As shown in the next figure, to discover the hotspots in the city, a map of population density is estimated to create a heat map. 4 crowded areas are identi?ed as the locations, where taxis have many pick-ups. The locations 1 and 2 are identi?ed as the most crowded as they have higher color density. Moreover, there are 6 hot spot areas which are passengers’ drop-o?. Moreover, areas 1, 2, and 3 have a higher density than areas 4, 5, and 6. We investigated and analyzed changes in the density of taxis’ distribution in the city over di?erent periods of time.
After further analysis, I found that the number of pick-up hotspots increases from 5:00 - 1:00 a.m. The hotspots of drop-o?s between 5:00 - 11:00 a.m are located in the northern part of the city, at other times in the south of the city This fact implies that most of the population moves into the northern part in the morning. Also, between 6:00 p.m. and 1:00 a.m. the number of hotspots increases.
After spatial-temporal analysis, I did two types of data clustering: a) clustering based on pick-up, and b) clustering based on drop-o? geographical coordinates. The maximum number of taxi pick-ups in a district is 120 and for drop-o?s is 60. Moreover, I identi?ed that most taxis pick-ups and drop-o?s are in the city center. Besides, I present the centroids of each cluster on the map. The centroid of pick-ups clusters are located on districts {205, (204, 203), 110, 402, 301, 302,303, 308, 315, 317} and the most of pickups happen in districts {302, 301}.
Then, I identi?ed that the centroids of drop o?s clusters are in the districts {111, 201, (203,204), 204, 301, 302, 303, 308, (303,308), 318}. Moreover, I can see that the number of taxis drop-o?s in districts {101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112} are higher than the other districts that are around the city center.
Finally, I identi?ed the trip communities which have the same pick-up and drop-o? longitude and latitude. Then, I explored their distribution in the city from one district to another district. The dark blue points in the map represent pick-up locations and other colors on the map are drop-o? locations.
As shown in the above ?gure [a, d, e], the distribution of taxis is not much scattered as compared to [b, c, i]. [a, e] illustrates taxis that are in cluster 1 and 5 where the pick-ups occurred from similar locations, but their drop-o? locations are di?erent as depicted by the colors of the clusters. Besides when we compare the movement pattern of clusters 1 and 5 in ?gure [a, e], I noticed that the clusters have shifted from north to south and vise Versa. [g, h, j] shows the distribution of trips out of the city and some trip anomalies are distinguishable in [f]. In [b, i], the behaviors of trips look di?erent; taxi movements start from outside of the city toward the city center and we can see high congestion nearby the city center.
In this article, I present the information that can be extracted and visualized from a bunch of data using data mining techniques. Moreover, I show you the way to understand the movement behavior of the vehicles by analyzing historical trajectories.
In the end, I would like to express my special thanks to Dr.Amnir Hadachi for guiding me in this study.