登录查看更多内容

K-means clustering and its Real World use cases in the Security Domain

Devendra Kanade

Immediate Joiner | Data Engineer | AWS Certified | Microsoft Azure Certified | Oracle Certified

发布日期: 2021年9月27日

Clustering

Clustering is used to get an intuition about the structure of the data. It defined as the task of identifying subgroups in the data such that data points in the same cluster are very similar while data points in different clusters are very different.

Unlike supervised learning, clustering is considered an unsupervised learning method since we don’t have the ground truth to compare the output of the clustering algorithm to the true labels to evaluate its performance. We investigate the structure of the data by grouping the data points into distinct subgroups.

K-means clustering

K-means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to assign a point to a cluster. In K-Means, each cluster is associated with a centroid.

The main objective of the K-Means algorithm is to minimize the sum of distances between the points and their respective cluster centroid.

K-means algorithm is an iterative algorithm that tries to partition the dataset into?K pre-defined distinct non-overlapping clusters where each data point belongs to only one group. It tries to make the intra-cluster data points as similar as possible while also keeping the clusters as different as possible.

It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster centroid is at the minimum. The less variation we have within clusters, the more homogeneous the data points are within the same cluster.

K-means algorithm works as follows:

Specify the number of clusters?K.
Initialize centroids by first shuffling the dataset and then randomly selecting?K?data points for the centroids without replacement.
Keep iterating until there is no change to the centroids. i.e assignments of data points to clusters aren’t changing.
Compute the sum of the squared distance between data points and all centroids.
Assign each data point to the closest cluster.

Use Cases in the Security Domain:

Here is a list of some of the interesting use cases of K-means in the Security Domain:

Customer segmentation

Clustering helps marketers improve their customer base, work on target areas, and segment customers based on purchase history, interests, or activity monitoring. how telecom providers can cluster pre-paid customers to identify patterns in terms of money spent in recharging, sending SMS, and browsing the internet. the classification would help the company target specific clusters of customers for specific campaigns.

Identifying crime localities

With data related to crimes available in specific localities in a city, the category of crime, the area of the crime, and the association between the two can give quality insight into crime-prone areas within a city or a locality.

Insurance fraud detection

Machine Learning has a critical role to play in fraud detection and has numerous applications in automobile, healthcare, and insurance fraud detection. utilizing past historical data on fraudulent claims, it is possible to isolate new claims based on their proximity to clusters that indicate fraudulent patterns. Since insurance fraud can potentially have a multi-million dollar impact on a company, the ability to detect frauds is crucial.

领英推荐

Solving the Problem of Missing Data

Quantum Analytics NG 11 个月前

Hierarchical Clustering: Financial Market Analysis

Quantace Research 1 年前

[New Releases] Analytics Leadership Sentiment Survey &…

AIM Research 1 年前

Cyber-profiling criminals

Cyber profiling is the process of collecting data from individuals and groups to identify significant correlations. The idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene.

Call record detail analysis

A call detail record(cdr) is the information captured by telecom companies during the call, SMS, and internet activity of a customer. This information provides greater insights about the customer’s needs when used with customer demographics. We can cluster customer activities for 24 hours by using the unsupervised k-means clustering algorithm. It is used to understand segments of customers with respect to their usage by hours.

Automatic clustering of it alerts

Large enterprise infrastructure technology components such as network, storage, or database generate large volumes of alert messages. Because alert messages potentially point to operational issues, they must be manually screened for prioritization for downstream processes. Clustering of data can provide insight into categories of alerts and mean time to repair, and help in failure predictions.

Rideshare data analysis

The publicly available uber ride information dataset provides a large amount of valuable data around traffic, transit time, peak pickup localities, and more. Analyzing this data is useful not just in the context of uber but also in providing insight into urban traffic patterns and helping us plan for the cities of the future.

Crime document classification

Cluster documents in multiple categories based on tags, topics, and the content of the document. This is a very standard classification problem and k-means is a highly suitable algorithm for this purpose. The initial processing of the documents is needed to represent each document as a vector and uses term frequency to identify commonly used terms that help classify the document. the document vectors are then clustered to help identify the similarity in document groups.

These were few use cases but the list goes on be it in Security Domain or any other, K-means is a very effective as well as an easy way of Clustering in machine learning.

THANKS FOR READING TILL THE END

→ Follow me on?Linkedin?for more Article’s on Research based and integration of new tools and technologies.

In Upcoming days , I will be sharing many article’s on Integrating multiple tools and technologies like Cloud Computing, DevOps, Big Data Hadoop, Machine Learning etc..

Further?Query or Suggestion’s?Feel Free to?Connect?with me On?Linkedin.

I Hope you have learn something new from this article

If you like it then?Like?&?Share ...

Thank you EveryOne For reading ..!!!

要查看或添加评论，请登录

Devendra Kanade的更多文章

Configure WebServer on Docker using Ansible Playbook

2022年3月11日

Configure WebServer on Docker using Ansible Playbook

In this article, I am going to set up an Apache webserver on a docker container using Ansible. Step1: Install Ansible…
Custom Network Topology Setup

2021年9月28日

Custom Network Topology Setup

In today's article, I am Going to "Create a network Topology Setup in such a way so that System A can ping to two…
Real-World Industry use cases of Azure Kubernetes Service

2021年9月28日

Real-World Industry use cases of Azure Kubernetes Service

What is Kubernetes? Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and…
Real-World Industry use cases of Neural Network

2021年9月28日

Real-World Industry use cases of Neural Network

Today, Neural Networks have brought a next-level revolution in the field of Artificial Intelligence. No one could have…
How OSPF(Open Short Path First) Routing Protocol implemented using Dijkstra Algorithm behind the scene

2021年9月26日

How OSPF(Open Short Path First) Routing Protocol implemented using Dijkstra Algorithm behind the scene

What is OSPF(Open Shortest Path First)? Open Shortest Path First(OSPF) is a standard routing protocol that’s been used…
INDUSTRY USE CASES OF OPENSHIFT

2021年9月26日

INDUSTRY USE CASES OF OPENSHIFT

What is OpenShift? OpenShift is a commercialized containerization software product created from an open-source project…
MongoDB Real-World Industry Usecases

2021年9月5日

MongoDB Real-World Industry Usecases

What is MongoDB? MongoDB is an open-source document-oriented database. It is used to store a larger amount of data and…
Deploy WordPress with Amazon RDS

2021年5月25日

Deploy WordPress with Amazon RDS

In today’s article, we will learn how to set up a WordPress site on Amazon EC2 to run a blog. WordPress requires a…
High Availability Architecture With AWS CLI include EBS, S3 & CloudFront & their Integration

2021年5月5日

High Availability Architecture With AWS CLI include EBS, S3 & CloudFront & their Integration

Task Description: In this task, I am going to Create High Availability Architecture with AWS CLI. The architecture…

2 条评论
Setup Apache Web Server On AWS Using EC2, EBS & S3 from AWS CLI

2021年5月5日

Setup Apache Web Server On AWS Using EC2, EBS & S3 from AWS CLI

Today, I am going to write an article on Creation High Availability Architecture with AWS CLI. Task Description: Task…

3 条评论

See all articles

K-means clustering and its Real World use cases in the Security Domain

Devendra Kanade

Immediate Joiner | Data Engineer | AWS Certified | Microsoft Azure Certified | Oracle Certified

Clustering

K-means clustering

K-means algorithm works as follows:

Use Cases in the Security Domain:

领英推荐

Devendra Kanade的更多文章

社区洞察

其他会员也浏览了

Data Analytics: Managing and Extracting Value from Large Datasets

Data Analytics Terminology 2

Unlocking Success through Harnessing Data Analytics

Understanding Correlation Analytics: A Key Tool in Data Science

Introduction to K-Means Clustering

Data Analytics for Every Businesses

Building Effective Data Science Teams for Advanced Decision Making

Building a Custom Data Analytics Assistant with Hallmark AI

PRESCRIPTIVE ANALYTICS

Ghosts In The Machine: Uncovering Five Hidden Patterns In Your Data

Clustering

K-means clustering

K-means algorithm works as follows:

Use Cases in the Security Domain:

领英推荐

Devendra Kanade的更多文章

Configure WebServer on Docker using Ansible Playbook

Custom Network Topology Setup

Real-World Industry use cases of Azure Kubernetes Service

Real-World Industry use cases of Neural Network

How OSPF(Open Short Path First) Routing Protocol implemented using Dijkstra Algorithm behind the scene

INDUSTRY USE CASES OF OPENSHIFT

MongoDB Real-World Industry Usecases

Deploy WordPress with Amazon RDS

High Availability Architecture With AWS CLI include EBS, S3 & CloudFront & their Integration

Setup Apache Web Server On AWS Using EC2, EBS & S3 from AWS CLI

社区洞察

其他会员也浏览了

Data Analytics: Managing and Extracting Value from Large Datasets

Data Analytics Terminology 2

Unlocking Success through Harnessing Data Analytics

Understanding Correlation Analytics: A Key Tool in Data Science

Introduction to K-Means Clustering

Data Analytics for Every Businesses

Building Effective Data Science Teams for Advanced Decision Making

Building a Custom Data Analytics Assistant with Hallmark AI

PRESCRIPTIVE ANALYTICS

Ghosts In The Machine: Uncovering Five Hidden Patterns In Your Data