K-means Clustering and its use-case in the Security Domain

K-means Clustering and its use-case in the Security Domain

What is K-means Clustering?

K-means clustering is one of the simplest and popular unsupervised machine learning algorithms.

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. in simple words, the aim is to segregate groups with similar traits and assign them into clusters. The goal of the k-means algorithm is to find groups in the data.

No alt text provided for this image

K-Means Algorithm

K-Means Clustering is an unsupervised learning algorithm , which groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.


It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover the categories of groups in the unlabeled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

The k-means clustering?algorithm mainly performs two tasks:

  • Determines the best value for K center points or centroids by an iterative process.
  • Assigns each data point to its closest k-center. Those data points which are near to the particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

No alt text provided for this image

WORKING OF K-Means Algorithm

The working of the K-Means algorithm is explained in the below steps:

Step-1:?Select the number K to decide the number of clusters.

Step-2:?Select random K points or centroids. (It can be other from the input dataset).

Step-3:?Assign each data point to their closest centroid, which will form the predefined K clusters.

Step-4:?Calculate the variance and place a new centroid of each cluster.

Step-5:?Repeat the third steps, which means reassign each datapoint to the new closest centroid of each cluster.

Step-6:?If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.


Use-Cases in the Security Domain

No alt text provided for this image

Malware Detection

Malware detection refers to the process of detecting the presence of malware on a host system or of distinguishing whether a specific program is malicious or benign. Malware detection technique plays vital role in detecting malware attack that can give high impact towards the cyber world. By using clustering, unsupervised machine learning is able to detect malware attack by identifying the behavior of the malware.

No alt text provided for this image

Clustering detection model by using?K-Means?clustering approach to detect malware behavior of data based on the features of the malware. Clustering techniques that use unsupervised algorithm in machine learning plays an important role in grouping similar malware characteristics by studying the behavior of the malware which results in, model is capable to cluster normal and suspicious data into two separate groups with high detection rate which is more than 90 percent accuracy.

Anomaly detection

No alt text provided for this image

Anomaly detection refers to methods that provide warnings of unusual behaviors which may compromise the security and performance of communication networks. Anomalous behaviors can be identified by comparing the distance between real data and cluster centroids. Identifying network anomalies is essential for communication networks of enterprises or institutions. The goal is to provide an early warning about an unusual behavior which can affect the security and the performance of a network.

Crime Analysis

No alt text provided for this image

Crime analysis is a law enforcement function that involves systematic analysis for identifying and analyzing patterns and trends in crime and disorder. Crime analysis also plays a role in devising solutions to crime problems, and formulating crime prevention strategies. Analysis of crime is essential for providing safety and security to the civilian population. K means?clustering technique is used to extract useful information from the high volume crime dataset and to interpret the data which assist police in identify and analyze crime patterns to reduce further occurrences of similar incidence and provide information to reduce the crime.

Crime document classification

Cluster documents in multiple categories based on tags, topics, and the content of the document. This is a very standard classification problem and k-means is a highly suitable algorithm for this purpose. The initial processing of the documents is needed to represent each document as a vector and uses term frequency to identify commonly used terms that help classify the document. the document vectors are then clustered to help identify similarity in document groups.

Identifying crime localities

No alt text provided for this image

With data related to crimes available in specific localities in a city, the category of crime, the area of the crime, and the association between the two can give quality insight into crime-prone areas within a city or a locality.?


Hope you like it.Thankyou!        



Sourabh Mehra

???????? ?????? (?????? ?????, ???????? ?????)

3 年

Wow great

回复
Harshal Thakare

ATSE@RedHat || Openshift || 3x RedHat Certified || DevOps(Docker??, Kubernetes?, Jenkins????) || Ansible || Cloud Computing ?(AWS) |||

3 年

Good Job Harshita Kumari

回复
Mukesh K.

PriceFx Certified Configuration Engineer

3 年

??

要查看或添加评论,请登录

社区洞察