K-means Clustering & it’s Real use-case in the Security Domain.

K-means Clustering & it’s Real use-case in the Security Domain.

What is clustering

Clustering is one of the most common exploratory data analysis technique used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different.


Unlike supervised learning, clustering is considered an unsupervised learning method since we don’t have the ground truth to compare the output of the clustering algorithm to the true labels to evaluate its performance. We only want to try to investigate the structure of the data by grouping the data points into distinct subgroups.

Types of Clustering

Clustering is a type of unsupervised learning wherein data points are grouped into different sets based on their degree of similarity.


The various types of clustering are:

  • Hierarchical clustering
  • Partitioning clustering

Hierarchical clustering is further subdivided into:

  • Agglomerative clustering
  • Divisive clustering

Partitioning clustering is further subdivided into:

  • K-Means clustering
  • Fuzzy C-Means clustering

What is

K-means algorithm?

K-Means clustering is an unsupervised learning algorithm. There is no labeled data for this clustering, unlike in supervised learning. K-Means performs the division of objects into clusters that share similarities and are dissimilar to the objects belonging to another cluster.


The term ‘K’ is a number. You need to tell the system how many clusters you need to create. For example, K = 2 refers to two clusters. There is a way of finding out what is the best or optimum value of K for a given data.

For a better understanding of k-means, let’s take an example from cricket. Imagine you received data on a lot of cricket players from all over the world, which gives information on the runs scored by the player and the wickets taken by them in the last ten matches. Based on this information, we need to group the data into two clusters, namely batsman and bowlers

Use-Cases in the Security Domain

Crime analysis using K-Means clustering


No alt text provided for this image

Criminal activities are a major cause for concern for law enforcement officials. Existing strategies to control crime are usually reactive, responding to the crime scene after the crimes have occurred. However, with the advent of technology and data analytics, it is now possible to recognize patterns in criminal activities using historical data and help law enforcement officers do a better job in crime prevention and control.


There are certain questions that law enforcement officers often ask - is there any correlation between crime type, the weapon used, and locations? What are the demographics of the people performing a certain crime? What are the most typical weapons that are possessed by the criminals? Can the reports help us in prediction or future criminal activities?


To answer these types of questions, we can use historical data about past criminal activities and mine this data for specific patterns. Historical data such as date, time, location of the crime, type of crime committed, gender, weapons used etc. are now easily available. This prior crime information can be converted to data mining problem and any information gathered from this analysis can help law enforcement officials do a better job.


Data analysts help speed up the process of solving crimes and help in law enforcement. Criminal data analytics works to create a geospatial plot of criminal activities. The plots can be analysed to predict the instances of crime.

No alt text provided for this image

Thanks for Reading !!

要查看或添加评论,请登录

Absar Qureshi的更多文章

  • Zenity: Red Hat Enterprise Linux 8.4

    Zenity: Red Hat Enterprise Linux 8.4

    What is Zenity? Zenity is an open source and a cross-platform application which displays GTK+ Dialog Boxes in…

    2 条评论
  • How Google Bot Uses Javascript?

    How Google Bot Uses Javascript?

    JavaScript is an important part of the web platform because it provides many features that turn the web into a powerful…

  • CYBER CRIME AND CONFUSION MATRIX.

    CYBER CRIME AND CONFUSION MATRIX.

    Confusion matrix is a fairly common term when it comes to machine learning. Today I would be trying to relate the…

    1 条评论
  • RUNNING GUI APP INSIDE DOCKER CONTAINER.

    RUNNING GUI APP INSIDE DOCKER CONTAINER.

    1-Check whether docker services are running or not. 2-Pull any image , in my case I pulled centos latest .

    4 条评论
  • Deploying Machine Learning Model inside a Docker container.

    Deploying Machine Learning Model inside a Docker container.

    *Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your…

社区洞察

其他会员也浏览了