Elbow Plot For k-means Algorithm

Elbow Plot For k-means Algorithm

Clustering is an unsupervised learning technique where data samples are grouped based on similarities. Two samples can be considered similar if their geometric distance is less than a pre-defined threshold, and this distance can be calculated by using any distance calculating formula. Some popular ones are the Manhattan distance and the Euclidean distance.

K-means is the most famous and easy to understand among all clustering algorithms. In our blog, we have discussed the detailed working of the k-means algorithm.

Although it is unsupervised, it requires a piece of additional information on the "number of clusters (groups)" it must form from the unlabelled data. As the dataset is not labelled, we may also lack this information, raising the question of "?????? ?????? ???? ??????-???????????? ?????? ???????????? ???? ?????????????????"

The elbow method is the way to decide this. In this method, we plot the Sum Squared Error (SSE) with respect to the number of clusters.

SSE is the sum of the squared distances of all samples from the centroid of their clusters. SSE will be the lowest in any dataset if the number of clusters equals the number of data samples (Think!). But clustering aims to bring similar samples together rather than to make a group for every element.

The Elbow plot shows the number of clusters forming the elbow-like shape, and the possible number of groups are those points from whether the plot bends to form the elbow. In the diagram above, the plot twists at two positions: when the number of clusters is 2 or 3. We can infer that the possible number of clusters is 2 or 3.

Later, based on the domain knowledge of the data, we decide how many clusters would be suitable for our dataset among the possible number of clusters suggested by the elbow plot. Enjoy Learning!

#machinelearning #datascience #clustering #algorithms

要查看或添加评论,请登录

EnjoyAlgorithms的更多文章

社区洞察

其他会员也浏览了