K  Means Clustering UseCases in Security Domain

K Means Clustering UseCases in Security Domain

Introduction

K Means Clustering is an Unsupervised Machine Learning. It is one of the simplest and popular unsupervised machine learning algorithms.

A cluster refers to a collection of data points aggregated together because of certain similarities.You’ll define a target number?k, which refers to the number of centroids you need in the dataset. A centroid is the imaginary or real location representing the centre of the cluster. Every data point is allocated to each of the clusters through reducing the in-cluster sum of squares. In other words, the K-means algorithm identifies?k?number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible. The?‘means’?in the K-means refers to averaging of the data; that is, finding the centroid.

How the K-means algorithm works

To process the learning data, the K-means algorithm in data mining starts with a first group of randomly selected centroids, which are used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to optimise the positions of the centroids

It halts creating and optimising clusters when either:

  • The centroids have stabilised — there is no change in their values because the clustering has been successful.
  • The defined number of iterations has been achieved.

Use-cases of K-means in Security Domain

1: Document Analysis:

There are many different reasons why you would want to run an analysis on a document. In this scenario, you want to be able to organise the documents quickly and efficiently.

Problem: Imagine you are limited in time and need to organise information held in documents quickly. To be able to complete this ask you need to: understand the theme of the text, compare it with other documents and classify it.

Working:?Hierarchical clustering has been used to solve this problem. The algorithm is able to look at the text and group it into different themes. Using this technique, you can cluster and organise similar documents quickly using the characteristics identified in the paragraph.

2: Criminal or Fraudulent Activities

In this scenario, we are going to focus on fraudulent taxi driver behaviour. However, the technique has been used in multiple scenarios.

Problem:?You need to look into fraudulent driving activity. The challenge is how do you identify what is true and which is false?

Working:?By analysing the GPS logs, the algorithm is able to group similar behaviours. Based on the characteristics of the groups you are then able to classify them into those that are real and which are fraudulent.?


要查看或添加评论,请登录

Charan V.的更多文章

  • Deploying a ML model inside Docker

    Deploying a ML model inside Docker

    In this blog, we'll be talking about deploying a trained Machine Learning Model inside a Docker Container. So, for…

    2 条评论
  • Integrating LVM with Hadoop

    Integrating LVM with Hadoop

    Hey Everyone, in this article we will learn how we can provide elasticity to our datanode's storage. First of all, we…

  • 1: Configuring the Webserver On a Docker Container

    1: Configuring the Webserver On a Docker Container

    In this article, we will be discussing about how to install and configure HTTPD WebServer on a Docker Container Image…

  • A Network Setup, Ping ONE but not the OTHER

    A Network Setup, Ping ONE but not the OTHER

    Hello Everyone, In this Article, we will setup a network such that it will ping to Google servers, but it will not ping…

  • AWS CLI Infrastructure and Using CMD to launch and manage Instances

    AWS CLI Infrastructure and Using CMD to launch and manage Instances

    Here, in this post, I'll let you know how to use aws-cli to launch and manage AWS services with very ease. Today, I…

  • What is BigData in Layman's Language

    What is BigData in Layman's Language

    BigData Problem Hey Everyone.! So I've just started a research on data like how much data does same company works per…

社区洞察

其他会员也浏览了