K-Means Clustering Use-Case In Security Domain :

K-Means Clustering Use-Case In Security Domain :

What is k-means?

Machine learning systems fall into four major categories that are defined by the amount of human supervision that they receive for training. These categories include unsupervised, semi-supervised, supervised, and reinforcement learning. K-Means clustering is an unsupervised learning technique. In other words, the system is not trained with human supervision.

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. in simple words, the aim is to segregate groups with similar traits and assign them into clusters. the goal of the k-means algorithm is to find groups in the data, with the number of groups represented by the variable k. the algorithm works iteratively to assign each data point to one of k?groups based on the features that are provided. in the reference image below, k=2, and there are two clusters identified from the source dataset.

No alt text provided for this image



How does the K-Means Algorithm Work?

To process the learning data, the K-means algorithm in data mining starts with a first group of randomly selected centroids, which are used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to optimize the positions of the centroids. It halts creating and optimizing clusters when either:

  • The centroids have stabilized — there is no change in their values because the clustering has been successful.
  • The defined number of iterations has been achieved.

No alt text provided for this image

The working of the K-Means algorithm is explained in the below steps:

Step-1:?Select the number K to decide the number of clusters.

Step-2:?Select random K points or centroids. (It can be other from the input dataset).

Step-3:?Assign each data point to their closest centroid, which will form the predefined K clusters.

Step-4:?Calculate the variance and place a new centroid of each cluster.

Step-5:?Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster.

Step-6:?If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

It allows us to cluster the data into different groups and a convenient way to discover the categories of groups in the unlabeled dataset on its own without the need for any training. It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.


Real use-case in the security domain

1. Document classification

cluster documents in multiple categories based on tags, topics, and the content of the document. this is a very standard classification problem and k-means is a highly suitable algorithm for this purpose. the initial processing of the documents is needed to represent each document as a vector and uses term frequency to identify commonly used terms that help classify the document. the document vectors are then clustered to help identify similarity in document groups.?

2. Delivery store optimization

optimize the process of good delivery using truck drones by using a combination of k-means to find the optimal number of launch locations and a genetic algorithm to solve the truck route as a traveling salesman problem.

3. Identifying crime localities

with data related to crimes available in specific localities in a city, the category of crime, the area of the crime, and the association between the two can give quality insight into crime-prone areas within a city or a locality.

4. Customer segmentation

clustering helps marketers improve their customer base, work on target areas, and segment customers based on?purchase history, interests, or activity monitoring.

5. Fantasy league stat analysis

analyzing player stats has always been a critical element of the sporting world, and with increasing competition, machine learning has a critical role to play here. as an interesting exercise,?if you would like to create a fantasy draft team and like to identify similar players based on player stats, k-means can be a useful option.

6. Insurance fraud detection

machine learning has a critical role to play in fraud detection and has numerous applications in automobile, healthcare, and insurance fraud detection. utilizing past historical data on fraudulent claims, it is possible to isolate new claims based on its proximity to clusters that indicate fraudulent patterns. since insurance fraud can potentially have a multi-million dollar impact on a company, the ability to detect frauds is crucial.

7. Rideshare data analysis

the publicly available uber ride information dataset provides a large amount of valuable data around traffic, transit time, peak pickup localities, and more. analyzing this data is useful not just in the context of uber but also in providing insight into urban traffic patterns and helping us plan for the cities of the future.?

8. Cyber-profiling criminals

cyber-profiling is the process of collecting data from individuals and groups to identify significant co-relations. the idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene.?

9. Call record detail analysis

a call detail record (CDR) is the information captured by telecom companies during the call, SMS , and internet activity of a customer. this information provides greater insights about the customer’s needs when used with customer demographics.?

10. Automatic clustering of it alerts

large enterprise it infrastructure technology components such as network, storage, or database generate large volumes of alert messages. because alert messages potentially point to operational issues, they must be manually screened for prioritization for downstream processes.


THANK YOU!!!



?

要查看或添加评论,请登录

Sneha Singh Pal的更多文章

社区洞察

其他会员也浏览了