K MEANS CLUSTERING

K MEANS CLUSTERING

K Means Clustering is one of the most common learning algorithms we might have?heard of.

Unsupervised Learning is a discovery pattern in the data without any labels. It is a type of machine learning that looks for previously undetected patterns in a data set with no pre-existing labels. Clustering is one such case of unsupervised learning.

One classic use case for applying to a cluster is the retail market. Let's assume Mr.X runs a clothing store and he has the data of only the age and the amount spent by each customer at the store. Now he wants to segment the customers. This scenario is well suited to apply a clustering algorithm. Clustering will enable Mr. X to analyze the groups of consumers that purchase from our store and then market more appropriately to each customer group.

As long as the data is small,?Mr.X himself can come up with specific segments of customers such as the following just by manual observation

1. Old customers who spend little

2. Young customers who spend a lot

3. Middle-aged customers who spend a medium amount

Mr.X can devise very different marketing strategies for these three groups.

No alt text provided for this image

The mathematical approach to the above scenario is called clustering and KMeans is one such clustering method.

Now let us dive deep into the working principle of K means clustering. The K-means clustering algorithm tries to determine k different points called centroids, which are at the center of other issues of the same class (intro cluster distance), but further away from matters of another class (inter-cluster distance). The algorithm's goal is to minimize the intra-cluster distance and maximize the inter-cluster distance.

There are multiple ways to find the distance between two points in an n-dimensional space such as?

  1. Euclidean distance
  2. Manhattan distance
  3. Minkowski distance and so on

It is up to the data scientist to pick and choose the proper method for calculating the distance but the most common and go-to method is the Euclidean method of calculating the distance

Here’s how the algorithm works in general.

Step 1:?Select K, the number of clusters you want to identify.

Step 2: Initialise K random points from the scatter space to be the centroids (central points for each cluster)

Step 3:?Measure the distance between each data point and each centroid and assign each data point to its closest centroid and the corresponding cluster

Step 4: Recalculate the midpoint (centroid) of each cluster.

Step 5: Repeat steps three and four to reassign data points to clusters based on the new centroid locations. Stop when either:

???a. The centroids have been stabilized; after computing the centroid of a cluster, no data points are reassigned.

???b. The predefined maximum number of iterations has been achieved.

No alt text provided for this image

Check out our website:?

Check out our features on:?

https://brave.onenine.cloud/

要查看或添加评论,请登录

OneNine AI的更多文章

社区洞察

其他会员也浏览了