K-means Clustering & it’s Real use-case in the Security Domain.
Absar Qureshi
Bachelor of Engineering(Computer Science) || Master of Business Administration(Financial Management)
What is clustering
Clustering is one of the most common exploratory data analysis technique used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different.
Unlike supervised learning, clustering is considered an unsupervised learning method since we don’t have the ground truth to compare the output of the clustering algorithm to the true labels to evaluate its performance. We only want to try to investigate the structure of the data by grouping the data points into distinct subgroups.
Types of Clustering
Clustering is a type of unsupervised learning wherein data points are grouped into different sets based on their degree of similarity.
The various types of clustering are:
Hierarchical clustering is further subdivided into:
Partitioning clustering is further subdivided into:
What is
K-means algorithm?
K-Means clustering is an unsupervised learning algorithm. There is no labeled data for this clustering, unlike in supervised learning. K-Means performs the division of objects into clusters that share similarities and are dissimilar to the objects belonging to another cluster.
领英推荐
The term ‘K’ is a number. You need to tell the system how many clusters you need to create. For example, K = 2 refers to two clusters. There is a way of finding out what is the best or optimum value of K for a given data.
For a better understanding of k-means, let’s take an example from cricket. Imagine you received data on a lot of cricket players from all over the world, which gives information on the runs scored by the player and the wickets taken by them in the last ten matches. Based on this information, we need to group the data into two clusters, namely batsman and bowlers
Use-Cases in the Security Domain
Crime analysis using K-Means clustering
Criminal activities are a major cause for concern for law enforcement officials. Existing strategies to control crime are usually reactive, responding to the crime scene after the crimes have occurred. However, with the advent of technology and data analytics, it is now possible to recognize patterns in criminal activities using historical data and help law enforcement officers do a better job in crime prevention and control.
There are certain questions that law enforcement officers often ask - is there any correlation between crime type, the weapon used, and locations? What are the demographics of the people performing a certain crime? What are the most typical weapons that are possessed by the criminals? Can the reports help us in prediction or future criminal activities?
To answer these types of questions, we can use historical data about past criminal activities and mine this data for specific patterns. Historical data such as date, time, location of the crime, type of crime committed, gender, weapons used etc. are now easily available. This prior crime information can be converted to data mining problem and any information gathered from this analysis can help law enforcement officials do a better job.
Data analysts help speed up the process of solving crimes and help in law enforcement. Criminal data analytics works to create a geospatial plot of criminal activities. The plots can be analysed to predict the instances of crime.
Thanks for Reading !!