登录查看更多内容

K Means Clustering UseCases in Security Domain

Charan V.

Software Engineer || 2x ICPC Regionalist 2021 & 22 || RHCE (EX294) || Technologies -> Java | AWS | Linux | DevOps | Jenkins | Python | Docker | Terraform | $PHP Hacker

发布日期: 2021年8月11日

Introduction

K Means Clustering is an Unsupervised Machine Learning. It is one of the simplest and popular unsupervised machine learning algorithms.

A cluster refers to a collection of data points aggregated together because of certain similarities.You’ll define a target number?k, which refers to the number of centroids you need in the dataset. A centroid is the imaginary or real location representing the centre of the cluster. Every data point is allocated to each of the clusters through reducing the in-cluster sum of squares. In other words, the K-means algorithm identifies?k?number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible. The?‘means’?in the K-means refers to averaging of the data; that is, finding the centroid.

How the K-means algorithm works

To process the learning data, the K-means algorithm in data mining starts with a first group of randomly selected centroids, which are used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to optimise the positions of the centroids

It halts creating and optimising clusters when either:

The centroids have stabilised — there is no change in their values because the clustering has been successful.
The defined number of iterations has been achieved.

Use-cases of K-means in Security Domain

领英推荐

Data Science: Unlocking Algorithms for Analytics…

vThink Global Technologies Private Limited 1 个月前

Understanding CatBoost!

Damien Benveniste, PhD 9 个月前

Implementing AdaGrad Optimizer in Spark

Patrick Nicolas 9 个月前

1: Document Analysis:

There are many different reasons why you would want to run an analysis on a document. In this scenario, you want to be able to organise the documents quickly and efficiently.

Problem: Imagine you are limited in time and need to organise information held in documents quickly. To be able to complete this ask you need to: understand the theme of the text, compare it with other documents and classify it.

Working:?Hierarchical clustering has been used to solve this problem. The algorithm is able to look at the text and group it into different themes. Using this technique, you can cluster and organise similar documents quickly using the characteristics identified in the paragraph.

2: Criminal or Fraudulent Activities

In this scenario, we are going to focus on fraudulent taxi driver behaviour. However, the technique has been used in multiple scenarios.

Problem:?You need to look into fraudulent driving activity. The challenge is how do you identify what is true and which is false?

Working:?By analysing the GPS logs, the algorithm is able to group similar behaviours. Based on the characteristics of the groups you are then able to classify them into those that are real and which are fraudulent.?

要查看或添加评论，请登录

Charan V.的更多文章

Deploying a ML model inside Docker

2021年5月27日

Deploying a ML model inside Docker

In this blog, we'll be talking about deploying a trained Machine Learning Model inside a Docker Container. So, for…

2 条评论
Integrating LVM with Hadoop

2021年3月14日

Integrating LVM with Hadoop

Hey Everyone, in this article we will learn how we can provide elasticity to our datanode's storage. First of all, we…
1: Configuring the Webserver On a Docker Container

2021年3月14日

1: Configuring the Webserver On a Docker Container

In this article, we will be discussing about how to install and configure HTTPD WebServer on a Docker Container Image…
A Network Setup, Ping ONE but not the OTHER

2021年3月12日

A Network Setup, Ping ONE but not the OTHER

Hello Everyone, In this Article, we will setup a network such that it will ping to Google servers, but it will not ping…
AWS CLI Infrastructure and Using CMD to launch and manage Instances

2020年10月31日

AWS CLI Infrastructure and Using CMD to launch and manage Instances

Here, in this post, I'll let you know how to use aws-cli to launch and manage AWS services with very ease. Today, I…
What is BigData in Layman's Language

2020年9月16日

What is BigData in Layman's Language

BigData Problem Hey Everyone.! So I've just started a research on data like how much data does same company works per…

See all articles