登录查看更多内容

k-mean clustering and its real usecase in the security domain

Neha Arya

Technical Support Associate at SOTI Inc.

发布日期: 2021年8月12日

k - mean clustering : is one of the unsupervised machine learning algorithms. Clustering means dividing data into a number of groups with similar properties, these groups are called Clusters. Clusters refers to a collection of data points which have similar traits. The number of clusters are decided by the variable k.

k refers to the number of centroids you need in the dataset. A centroid is the location representing the center of the cluster.

How k - mean Clustering Works

Step-01 : First select the number k to decide the number of clusters.

Step-02 : It selects random k centroids in the dataset.

Step-03 : Then, calculate the Euclidean distance and assign the data points to the nearest centroid, thus creating k clusters.

Step-04 : Now, find the original centroid in each group.

Step-05 : Now, repeat the steps till each data point assign to a cluster.

It is a simple example to understand how k-means works. In this example, we are going to first generate 2D dataset containing 4 different blobs and after that will apply k-means algorithm to see the result.

First, we will start by importing the necessary packages ?

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
from sklearn.cluster import KMeans

The following code will generate the 2D, containing four blobs ?

from sklearn.datasets.samples_generator import make_blobs
X, y_true = make_blobs(n_samples = 400, centers = 4, cluster_std = 0.60, random_state = 0)

Next, the following code will help us to visualize the dataset ?

plt.scatter(X[:, 0], X[:, 1], s = 20);
plt.show()

Next, make an object of KMeans along with providing number of clusters, train the model and do the prediction as follows ?

领英推荐

The DataVolt Project, Diffusion Models Course, Feature…

Rami Krispin 1 个月前

The Ultimate guide to AI, Data Science & Machine…

Vipul Patel 5 年前

End-to-end Machine Learning project on predicting…

Gurupratap Matharu 6 年前

kmeans = KMeans(n_clusters = 4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)

Now, with the help of following code we can plot and visualize the cluster’s centers picked by k-means Python estimator ?

from sklearn.datasets.samples_generator import make_blobs
X, y_true = make_blobs(n_samples = 400, centers = 4, cluster_std = 0.60, random_state = 0)

Next, the following code will help us to visualize the dataset ?

plt.scatter(X[:, 0], X[:, 1], c = y_kmeans, s = 20, cmap = 'summer')
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c = 'blue', s = 100, alpha = 0.9);
plt.show()

Elbow Method : In the Elbow method, we are varying the number of clusters and for each value of K, we are calculating WCSS(Within-Cluster Sum Square). WCSS is the sum of squared distance between each point and the centroid in cluster. When we plot the WCSS graph with the k value, the graph looks like an Elbow.

As the number of clusters increases, the WCSS value will start to decrease. WCSS value is largest at k=1. On increasing the number of clusters the graph will rapidly change at a point and create an Elbow shape. From this point, graph starts to move almost parallel to the x-axis. The value of k corresponding to this point is the optimal k value or optimal number of clusters.

How K- means Clustering Hepls In Security Domain ?

1. Insurance fraud detection

machine learning has a critical role to play in fraud detection and has numerous applications in automobile, healthcare, and insurance fraud detection. utilizing past historical data on fraudulent claims, it is possible to isolate new claims based on its proximity to clusters that indicate fraudulent patterns. since insurance fraud can potentially have a multi-million dollar impact on a company, the ability to detect frauds is crucial.

2. cyber-profiling criminals

cyber-profiling is the process of collecting data from individuals and groups to identify significant co-relations. the idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene.?

3. automatic clustering of it alerts

large enterprise it infrastructure technology components such as network, storage, or database generate large volumes of alert messages. because alert messages potentially point to operational issues, they must be manually screened for prioritization for downstream processes. Clustering of data can provide insight into categories of alerts and mean time to repair, and help in failure predictions.

"Keep sharing, keep learning"

要查看或添加评论，请登录

Neha Arya的更多文章

Google Map JavaScript API

2021年6月29日

Google Map JavaScript API

JavaScript : JavaScript is the Programming Language for the Web applications. JavaScript, HTML and CSS these three…
Confusion Matrix In Cyber Security

2021年6月5日

Confusion Matrix In Cyber Security

Cyber Crime : Cyber crime is an illegal activity that targets or uses computer, computer network or network devices to…

4 条评论
GUI Application Inside Docker

2021年5月31日

GUI Application Inside Docker

Summer - Task 02 ??????? Summer Program Linux World Informatics Pvt. Lt.
Launch Machine Learning Inside Docker

2021年5月27日

Launch Machine Learning Inside Docker

Linux World Pvt. Lt.

k-mean clustering and its real usecase in the security domain

Neha Arya

Technical Support Associate at SOTI Inc.

How k - mean Clustering Works

领英推荐

How K- means Clustering Hepls In Security Domain ?

1. Insurance fraud detection

2. cyber-profiling criminals

3. automatic clustering of it alerts

Neha Arya的更多文章

社区洞察

其他会员也浏览了

Term Frequency–Inverse Document Frequency

Books I considered helpful

PySpark MLlib – Algorithms and Parameters

AI tools in Data science roles

Keras & TensorFlow to Predict Market Movements and Backtest using Backtrader

XGBoost

Unleashing Hidden Patterns: An Introduction to the Power of Clustering Analysis

Journey into the Galaxy of Machine Learning ...

Taming Complexity with LASSO

Text classification using the Bag Of Words Approach with NLTK and Scikit Learn

How k - mean Clustering Works

领英推荐

How K- means Clustering Hepls In Security Domain ?

1. Insurance fraud detection

2. cyber-profiling criminals

3. automatic clustering of it alerts

Neha Arya的更多文章

Google Map JavaScript API

Confusion Matrix In Cyber Security

GUI Application Inside Docker

Launch Machine Learning Inside Docker

社区洞察

其他会员也浏览了

Term Frequency–Inverse Document Frequency

Books I considered helpful

PySpark MLlib – Algorithms and Parameters

AI tools in Data science roles

Keras & TensorFlow to Predict Market Movements and Backtest using Backtrader

XGBoost

Unleashing Hidden Patterns: An Introduction to the Power of Clustering Analysis

Journey into the Galaxy of Machine Learning ...

Taming Complexity with LASSO

Text classification using the Bag Of Words Approach with NLTK and Scikit Learn