Hierarchial Clustering
Kiruthika Subramani
Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA
?????? It's time for another "Cup of coffee with an Algorithm in ML"! ???? This week, we're diving into the intriguing world of Hierarchical Clustering! ???? Grab your favorite cup of coffee ??and join us as we unravel the mysteries of clustering, discover how to group similar data points together, and create a hierarchical structure of clusters. Get ready for an exciting journey into the depths of Hierarchical Clustering! ?? Let's begin!
Hierarchical clustering is a method that helps us group similar things together by creating a tree-like structure.
At the beginning, each item is considered as its own cluster. Then, the algorithm looks at how similar, each item is to one another and starts merging the most similar items together into clusters. It keeps doing this, merging more and more items based on their similarities, until you have all the items in one big cluster.
Like this
However, the interesting part is that hierarchical clustering doesn't stop there. It continues to create a hierarchy by splitting the big cluster into smaller clusters, and then further splitting those smaller clusters into even more smaller groups, if desired.
The hierachial Clustering is of two types. One is Agglomerative Clustering and another one is divisive Clustering
Agglomerative and divisive clustering represent two different strategies for building a hierarchical structure of clusters. Agglomerative clustering starts from the bottom and merges clusters, while divisive clustering starts from the top and splits.
Terminologies of Hierarchial Clustering
领英推荐
Wait, you all might have this question, How Hierarchial Clustering differs from K Means clustering?
Hierarchical clustering builds a hierarchy of clusters by merging or splitting based on similarities, while K-means clustering aims to partition data points into a pre-defined number of clusters based on their proximity to cluster centroids.
Come on Let's implement Agglomerative Clustering!
import pandas as pd
from sklearn.datasets import load_iris
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
import matplotlib.pyplot as plt
# Load the Iris dataset
iris = load_iris()
data = iris.data
columns = iris.feature_names
# Convert the data array into a DataFrame
df = pd.DataFrame(data, columns=columns)
# Calculate the linkage matrix using the complete linkage method
linkage_matrix = linkage(df.values, method='complete')
# Plot the dendrogram
plt.figure(figsize=(10, 6))
dendrogram(linkage_matrix)
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Sample Index')
plt.ylabel('Distance')
plt.show()
# Cut the dendrogram to obtain clusters
k = 3? # Desired number of clusters
clusters = fcluster(linkage_matrix, k, criterion='maxclust')
# Add the cluster labels to the original DataFrame
df['Cluster'] = clusters
print(df)
Divisive Clustering
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.cluster import AgglomerativeClustering
# Load the Iris dataset
iris = load_iris()
data = iris.data
# Convert the data into a DataFrame
df = pd.DataFrame(data, columns=iris.feature_names)
# Perform divisive clustering
n_clusters = 3? # Number of clusters
model = AgglomerativeClustering(n_clusters=n_clusters, linkage='average')
clusters = model.fit_predict(df)
# Add the cluster labels to the DataFrame
df['Cluster'] = clusters
# Print the cluster assignments
print(df['Cluster'].value_counts())
What's the difference in the code, both seems to be same???
The key distinction is the choice of linkage method, which determines the clustering behavior.
Hope you got it!!
???? We've successfully explored and implemented Hierarchical Clustering algorithm to group customers based on their purchasing behavior, demographics, and preferences. ????? Now, let's gather over a cup of coffee next week to delve into another fascinating algorithm in Machine Learning. ??
Cheers,
Kiruthika.
Student at Anna University Regional Campus Madurai
1 年Well said KIRUTHIKA S ,doing great job. ??
Full Stack Developer | React js | HTML | CSS | JavaScript | SQL | Firebase | GCCF | Blogging | SEO | ML AI | Developing New Apps From Scrap
1 年Thanks for sharing.??
Relationship Manager
1 年Helpful! This will