Elbow Plot For k-means Algorithm
EnjoyAlgorithms
Algorithms are everywhere in computer science. Learn, solve and succeed!
Clustering is an unsupervised learning technique where data samples are grouped based on similarities. Two samples can be considered similar if their geometric distance is less than a pre-defined threshold, and this distance can be calculated by using any distance calculating formula. Some popular ones are the Manhattan distance and the Euclidean distance.
K-means is the most famous and easy to understand among all clustering algorithms. In our blog, we have discussed the detailed working of the k-means algorithm.
Although it is unsupervised, it requires a piece of additional information on the "number of clusters (groups)" it must form from the unlabelled data. As the dataset is not labelled, we may also lack this information, raising the question of "?????? ?????? ???? ??????-???????????? ?????? ???????????? ???? ?????????????????"
The elbow method is the way to decide this. In this method, we plot the Sum Squared Error (SSE) with respect to the number of clusters.
领英推荐
SSE is the sum of the squared distances of all samples from the centroid of their clusters. SSE will be the lowest in any dataset if the number of clusters equals the number of data samples (Think!). But clustering aims to bring similar samples together rather than to make a group for every element.
The Elbow plot shows the number of clusters forming the elbow-like shape, and the possible number of groups are those points from whether the plot bends to form the elbow. In the diagram above, the plot twists at two positions: when the number of clusters is 2 or 3. We can infer that the possible number of clusters is 2 or 3.
Later, based on the domain knowledge of the data, we decide how many clusters would be suitable for our dataset among the possible number of clusters suggested by the elbow plot. Enjoy Learning!