登录查看更多内容

K-Means Clustering and UseCases in Security Domain.

Hemendra Chaudhary

DevOps Engineer @ MyWays | AWS, DevOps, OpenSearch

发布日期: 2021年8月12日

K means is one of the most popular Unsupervised Machine Learning Algorithms Used for Solving Classification Problems. K Means segregates the unlabeled data into various groups, called clusters, based on having similar features, common patterns.

What is clustering?

Clustering is one of the most common exploratory data analysis techniques used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different.

Unlike supervised learning, clustering is considered an unsupervised learning method since we don’t have the ground truth to compare the output of the clustering algorithm to the true labels to evaluate its performance. We only want to try to investigate the structure of the data by grouping the data points into distinct subgroups.

What Is K-Means Algorithm?

K-means Algorithm is an Iterative algorithm that divides a group of n datasets into k subgroups /clusters based on the similarity and their mean distance from the centroid of that particular subgroup/ formed. Or K-Means clustering is an unsupervised learning algorithm. There is no labeled data for this clustering, unlike in supervised learning. K-Means performs the division of objects into clusters that share similarities and are dissimilar to the objects belonging to another cluster.

The term ‘K’ is a number. You need to tell the system how many clusters you need to create. For example, K = 2 refers to two clusters, K=3 refers to three clusters so on. There is a way of finding out what is the best or optimum value of K for a given data.

For a better understanding of k-means, let’s take an example from cricket. Imagine you received data on a lot of cricket players from all over the world, which gives information on the runs scored by the player and the wickets taken by them in the last ten matches. Based on this information, we need to group the data into two clusters, namely batsmen and bowlers.

Types of Clustering:

The various types of clustering are:

Hierarchical clustering
Partitioning clustering

Hierarchical clustering is further subdivided into:

Agglomerative clustering
Divisive clustering

Partitioning clustering is further subdivided into:

K-Means clustering
Fuzzy C-Means clustering

领英推荐

Data Scientist Roles and Responsibilities in 2024

Analytics Insight? 8 个月前

Unraveling Clustering Algorithms: From Evolution to…

Pratik Thorat 1 年前

K-means Clustering: Applications and Real-world Use…

Vrata Tech Solutions (VTS) 11 个月前

Use-Cases in the Security Domain

We utilize K-means clustering in many areas, and we have numerous use-cases in the security sector. One of the most significant topics where we use k-mean clustering for an optimal method is here.

Using K-Means Clustering Algorithm to Analyze Logs from Proxy Server and Captive Portal:

The amount of data created by users’ different interactions with websites is continuously rising, as is the amount of traffic on the World Wide Web. As a result, online data becomes one of the most important tools for retrieving information and discovering new knowledge.

Web Usage Mining was used to find valuable and fascinating patterns from the web data using logs from the Proxy Server and Captive Portal databases. In addition, the k-means clustering method was utilized to create particular groupings of user access patterns based on the number of user sessions and websites visited by network users. It was discovered as a result of the findings

Cyber-Profiling Criminals:

Cyber-Profiling is the process of collecting data from individuals and groups to identify significant correlations. the idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene.

Call record detail analysis:

A Call detail record (CDR) is the information captured by telecom companies during the call, SMS, and internet activity of a customer. this information provides greater insights about the customer’s needs when used with customer demographics. in?this article, you will understand how you can cluster customer activities for 24 hours by using the unsupervised k-means clustering algorithm. it is used to understand segments of customers with respect to their usage by hours.

Anomaly detection:

Anomaly detection refers to methods that provide warnings of unusual behaviors which may compromise the security and performance of communication networks. Anomalous behaviors can be identified by comparing the distance between real data and cluster centroids. Identifying network anomalies is essential for communication networks of enterprises or institutions. The goal is to provide an early warning about an unusual behavior that can affect the security and the performance of a network.

Malware Detection:

The process of identifying the existence of malware on a host system or determining whether a certain application is dangerous or benign is known as malware detection. Malware detection techniques are essential for identifying malware attacks that have a significant influence on the cyber world. Unsupervised machine learning may detect malware attacks by detecting the behavior via clustering.

Insurance fraud detection:

Machine Learning plays an important role in fraud detection and has a wide range of applications in the automotive, healthcare, and insurance industries. It is feasible to identify new claims based on their closeness to clusters that suggest fraudulent tendencies using previous data on fraudulent claims. Because insurance fraud has the potential to cost a firm million of dollars, the ability to identify fraud is critical.

K-Means Clustering is one of the most common algorithm used for clustering.

Hope you found it informative.

Thank You.

要查看或添加评论，请登录

Hemendra Chaudhary的更多文章

Industry use cases of Jenkins

2021年9月12日

Industry use cases of Jenkins

What is Jenkins? Jenkins? is an open-source automation server. With Jenkins, organizations can accelerate the software…
The Usecase of JavaScript n industries

2021年8月12日

The Usecase of JavaScript n industries

What is Javascript? JavaScript is a lightweight, open-source and cross-platform programming. It is designed for…
Confusion Matrix And Cyber Crime

2021年6月6日

Confusion Matrix And Cyber Crime

What is Confusion Matrix? When we get the data, after data cleaning, pre-processing, and wrangling, the first step we…
Neural Networks and their Applications in Industry

2021年3月31日

Neural Networks and their Applications in Industry

INTRODUCTION Over the past few years, technology has become very dynamic. It is fuelling itself at an ever-increasing…
Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

2021年3月14日

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

To understand the term 'Big Data', we first need to understand "What is data?". So, Data are a collection of facts…
USE-CASE FOR KUBERNETES

2021年1月21日

USE-CASE FOR KUBERNETES

Introduction Kubernetes is a powerful open-source system, initially developed by Google, for managing containerized…
Ansible: How industries are solving challenges using Ansible

2020年12月6日

Ansible: How industries are solving challenges using Ansible

In this article, we come to know about: What is Ansible Architecture of Ansible Ansible: Concept Why we need Ansible…
Use Case Of ML/AI In Agriculture

2020年10月21日

Use Case Of ML/AI In Agriculture

Artificial Intelligence(AI) refers to the simulation of human intelligence in machines that are programmed to think…
Control EC2 Service Using CLI

2020年10月16日

Control EC2 Service Using CLI

In this task, we are going to perform the following: Create a Key Pair Create a Security Group Launch an instance using…
Case Studies - Cloud Computing

2020年9月21日

Case Studies - Cloud Computing

An introduction to cloud computing right from the basics What is cloud computing, in simple terms? Cloud computing is…

1 条评论

See all articles

K-Means Clustering and UseCases in Security Domain.

Hemendra Chaudhary

DevOps Engineer @ MyWays | AWS, DevOps, OpenSearch

What is clustering?

What Is K-Means Algorithm?

Types of Clustering:

领英推荐

Use-Cases in the Security Domain

Using K-Means Clustering Algorithm to Analyze Logs from Proxy Server and Captive Portal:

Cyber-Profiling Criminals:

Call record detail analysis:

Anomaly detection:

Malware Detection:

Insurance fraud detection:

Hemendra Chaudhary的更多文章

社区洞察

其他会员也浏览了

INTERVIEW QUESTIONS ALONG WITH BRIEF ANSWERS

Data clustering

Clustering - Machine Learning Algorithms

Clustering Algorithms

Data for Good: Clustering Countries using Unsupervised Machine Learning

K-means clustering: Applications in security domains

k-mean clustering and its real usecase in the security domain

K-means Clustering and its real use cases in security domain

Day 10 - K-Means Clustering

Group Think: A Deep Dive into the World of Clustering Algorithms

What is clustering?

What Is K-Means Algorithm?

Types of Clustering:

领英推荐

Use-Cases in the Security Domain

Using K-Means Clustering Algorithm to Analyze Logs from Proxy Server and Captive Portal:

Cyber-Profiling Criminals:

Call record detail analysis:

Anomaly detection:

Malware Detection:

Insurance fraud detection:

Hemendra Chaudhary的更多文章

Industry use cases of Jenkins

The Usecase of JavaScript n industries

Confusion Matrix And Cyber Crime

Neural Networks and their Applications in Industry

Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

USE-CASE FOR KUBERNETES

Ansible: How industries are solving challenges using Ansible

Use Case Of ML/AI In Agriculture

Control EC2 Service Using CLI

Case Studies - Cloud Computing

社区洞察

其他会员也浏览了

INTERVIEW QUESTIONS ALONG WITH BRIEF ANSWERS

Data clustering

Clustering - Machine Learning Algorithms

Clustering Algorithms

Data for Good: Clustering Countries using Unsupervised Machine Learning

K-means clustering: Applications in security domains

k-mean clustering and its real usecase in the security domain

K-means Clustering and its real use cases in security domain

Day 10 - K-Means Clustering

Group Think: A Deep Dive into the World of Clustering Algorithms