AI APPLICATIONS TO INCREASE NAVIGATION SAFETY
Today we will consider the basic problem of classification. Can the type and navigational purpose of the ships be estimated only from known data such as width, height, course, or speed? To solve this research question, the dynamic values and basic characteristics of any ship in a certain region will be discussed, and the type and purpose of the ship will be estimated by statistical models
Machine learning (ML) algorithms mimic the shape of people's learning and focus on the use of data and gradually increase accuracy, a branch of artificial intelligence and computer science.
Machine learning algorithms
AIS (Automatic Identification System) device is an automatic identification system. It is an effective system for tracking ships and regulating maritime traffic. It transmits and receives data from VHF radio frequencies and signals. Ships and ship traffic services (VTS) equipped with VHF transceivers can thus reliably display the overall picture. However, the VHF range is limited to a maximum of approximately 25 nautical miles.
? There are two types of AIS devices, type A and type B. Type A is more powerful so it can send data up to 25 nautical miles. Type B AIS device is effective at a distance of 5-10 nautical miles. Type A AIS has a power of 12.5 kW, while AIS type B has a power of 2 kW. AIS signals every 2-10 seconds while cruising. On the other hand, it makes a signal update every 3-4 minutes for ships at anchor.
According to the SOLAS rules determined by the International Maritime Organization (IMO), it must be kept on ships of 300 GRT and larger and all commercial ships carrying passengers. Although it is not compulsory, some fishermen and other boats use it voluntarily due to its usefulness and benefits.
To solve our research question mentioned above, we will use AIS information published as open source by the Danish Maritime Authority. The aforementioned data were compiled from the ships transiting the Kattegat Strait between January 1st and March 10th, 2022. AIS data is published as open source in some countries.
Two types of data, static and dynamic, are kept in the AIS device:
Static Information :
1. The ship's IMO number 2. The ship's MMSI number 3. The ship's Call Sign 4. The ship's name 5. The ship's type 6. What type of destination this message was received from (like Class A / Class B) 7. Width of ship 8. Length of ship 9. Draft of ship 10. Type of GPS device 11. Length from GPS to bow (Length A) 12. Length from GPS to stern (Size B) 13. Length from GPS to starboard (Size C) 14. Length from GPS to port side (Dimension D)
?Dynamic Data:
1. Time information (31/12/2015 in 23:59:59 format) 2. Latitude 3. Longitude 4. Navigational status (For example: 'Fishing', Anchored, etc.) 5. Rate of Turn (ROT) 6. Speed Over Ground (SOG) 7. Course Over Ground (COG) 8. Heading 9. Type of cargo 10. Port of Destination 11. Estimated Time of Arrival (ETA) 12. Data source type, eg. AIS
Our main goal is to build a machine learning model using AIS information in the Kattegat Strait and then to predict the ship type of any ship passing through this region using the width, length, draft, course, and speed information. For this purpose, in our model we used 'MMSI', 'Width', 'Height', 'Draft', 'Cruise', 'Road', 'Speed' and 'Heading' from AIS data as independent variables. We used the ship type as the target, that is, the dependent variable.
As can be seen, AIS data contains a lot of information, but not every ship has AIS data. As an example of this:
* Warships * Ships smaller than 300grt * Fishermen * May turn off AIS transceivers due to technical malfunctions.
Our dataset consists of 358351 AIS messages, which means we have 358351 observations. However, since a ship has more than one AIS message, the number of unique ships in our dataset is 3894. The properties of our variables are as below.
The other table at the bottom shows the distribution of 3894 ships detected in the Kattegat Strait between January 1st and March 10th, 2022, according to their types. Ship types are also the target variable that we will estimate.
领英推荐
We have aggregated those below 2% of the ship classes in Table 2. Because the machine learning algorithm cannot catch the pattern of this kind of low-rate classes. In other words, there is not enough data to learn these classes. Therefore, we combined high-speed boats, government vessels, pilotage boats, harbor boats, and towing/towing vessels as one class with a ratio of less than 2%. These ship types are entered in the model as a rare class. As a result of the estimation, we will get the result of the rare class.
? The correlation graph between our variables is given below. It was observed that there was a high correlation between the 'width' and 'length' variable and between the 'heading' and 'cog' variable. To solve this problem, the 'dimension' variable was created by multiplying the 'width' and 'length' numeric variables, and the 'width' and 'length' variables were removed from the data set. Also, 'heading' and 'cog' variables were removed from the data set after they were combined under the 'route' variable. Since the 'mmsi' variable is unique and specific for each ship, it has only been used for exploratory data analysis
After preparing all the variables, we can build the machine learning model. For this purpose, Light GBM Machine Learning Algorithm, which is known to give fast and successful results, was used.
Light GBM is a decision tree-based algorithm
Two strategies, level-wise or leaf-wise, can be used in decision trees. In the level-wise strategy, the tree's balance is maintained as the tree grows. In the leaf-wise strategy, division from leaves, which reduces loss, continues. Thanks to this feature, LightGBM is separated from other boosting algorithms. The model has less error rate and learns faster with the leaf-wise strategy. However, the leaf-wise growth strategy causes the model to be prone to over-learning in cases where the number of data is low. Therefore, the algorithm is more suitable for use in big data.
After the model was established, to test the model, we entered the model with 10 different values navigating in the Kattegat Strait in December 2021. As noted, we used the data from January 1 to March 10, 2022, when building the model. In other words, the 10 samples we determined for the estimation are an outside example from our data set.
Looking at the upper table, it was observed that 7 out of 10 predictions gave correct results. Considering that our model has an overall accuracy score of 0.67, this is an expected result. Prediction scores (confusion matrix) based on ship type are given in table-5. Models with higher precision and sensitivity values give more accurate results. The f-1 score shows us the harmonic mean of precision and sensitivity values.
In this study, it has been tried to predict the ship types and finally the ship's navigational intention by using the dynamic and static data obtained from the ships' AIS data. This study was carried out in a particular area. Because different routes and speeds can be used in each region. For example, in shallows, islands, islets, rocks, and coastal areas, clear routes/speeds are used by the characteristics of the ship to pass clear. The Kattegat Strait, which is the area we chose, is a heavy traffic area with narrow waterways like this.
Once we know the type of a ship, we can guess its purpose. In this case, it can be considered as a step toward increasing situational awareness
More data can be used in the future to increase the reliability/accuracy of the model. In addition, we used a single algorithm in our model. Using two or more machine learning algorithms in an integrated way will increase the success of the model.
?For Detailed Information: