Hierarchial Clustering

Hierarchial Clustering

?????? It's time for another "Cup of coffee with an Algorithm in ML"! ???? This week, we're diving into the intriguing world of Hierarchical Clustering! ???? Grab your favorite cup of coffee ??and join us as we unravel the mysteries of clustering, discover how to group similar data points together, and create a hierarchical structure of clusters. Get ready for an exciting journey into the depths of Hierarchical Clustering! ?? Let's begin!

Hierarchical clustering is a method that helps us group similar things together by creating a tree-like structure.

No alt text provided for this image


At the beginning, each item is considered as its own cluster. Then, the algorithm looks at how similar, each item is to one another and starts merging the most similar items together into clusters. It keeps doing this, merging more and more items based on their similarities, until you have all the items in one big cluster.

Like this

No alt text provided for this image

However, the interesting part is that hierarchical clustering doesn't stop there. It continues to create a hierarchy by splitting the big cluster into smaller clusters, and then further splitting those smaller clusters into even more smaller groups, if desired.

The hierachial Clustering is of two types. One is Agglomerative Clustering and another one is divisive Clustering

No alt text provided for this image


Agglomerative and divisive clustering represent two different strategies for building a hierarchical structure of clusters. Agglomerative clustering starts from the bottom and merges clusters, while divisive clustering starts from the top and splits.

Terminologies of Hierarchial Clustering

  1. Dendrogram: A tree-like diagram that shows how clusters are merged or split in hierarchical clustering.

No alt text provided for this image


  1. Linkage: Method used to calculate the distance or similarity between clusters.
  2. Cluster Distance: Measure of dissimilarity or similarity between clusters.
  3. Cluster Fusion: Merging two or more clusters into a single cluster.
  4. Cluster Splitting: Dividing a cluster into smaller clusters.
  5. Cut-off Threshold: Value used to determine the number of clusters by cutting the dendrogram at a certain height or distance.
  6. Number of Clusters: Desired or determined number of distinct groups formed from the data.
  7. Silhouette Coefficient: Measure of how well data points fit into their assigned clusters, taking into account cohesion and separation.


Wait, you all might have this question, How Hierarchial Clustering differs from K Means clustering?

No alt text provided for this image


Hierarchical clustering builds a hierarchy of clusters by merging or splitting based on similarities, while K-means clustering aims to partition data points into a pre-defined number of clusters based on their proximity to cluster centroids.

Come on Let's implement Agglomerative Clustering!

import pandas as pd
from sklearn.datasets import load_iris
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
import matplotlib.pyplot as plt


# Load the Iris dataset
iris = load_iris()
data = iris.data
columns = iris.feature_names


# Convert the data array into a DataFrame
df = pd.DataFrame(data, columns=columns)


# Calculate the linkage matrix using the complete linkage method
linkage_matrix = linkage(df.values, method='complete')


# Plot the dendrogram
plt.figure(figsize=(10, 6))
dendrogram(linkage_matrix)
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Sample Index')
plt.ylabel('Distance')
plt.show()


# Cut the dendrogram to obtain clusters
k = 3? # Desired number of clusters
clusters = fcluster(linkage_matrix, k, criterion='maxclust')


# Add the cluster labels to the original DataFrame
df['Cluster'] = clusters


print(df)
        

Divisive Clustering

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.cluster import AgglomerativeClustering


# Load the Iris dataset
iris = load_iris()
data = iris.data


# Convert the data into a DataFrame
df = pd.DataFrame(data, columns=iris.feature_names)


# Perform divisive clustering
n_clusters = 3? # Number of clusters
model = AgglomerativeClustering(n_clusters=n_clusters, linkage='average')
clusters = model.fit_predict(df)


# Add the cluster labels to the DataFrame
df['Cluster'] = clusters


# Print the cluster assignments
print(df['Cluster'].value_counts())
        

What's the difference in the code, both seems to be same???

No alt text provided for this image


The key distinction is the choice of linkage method, which determines the clustering behavior.

Hope you got it!!

???? We've successfully explored and implemented Hierarchical Clustering algorithm to group customers based on their purchasing behavior, demographics, and preferences. ????? Now, let's gather over a cup of coffee next week to delve into another fascinating algorithm in Machine Learning. ??

Cheers,

Kiruthika.





Praveen Kumar Chidambaram

Student at Anna University Regional Campus Madurai

1 年

Well said KIRUTHIKA S ,doing great job. ??

Sabareesh Ramasamy

Full Stack Developer | React js | HTML | CSS | JavaScript | SQL | Firebase | GCCF | Blogging | SEO | ML AI | Developing New Apps From Scrap

1 年

Thanks for sharing.??

Anbarasu P

Relationship Manager

1 年

Helpful! This will

要查看或添加评论,请登录

Kiruthika Subramani的更多文章

  • RAG System with Video

    RAG System with Video

    Hello Everyone,It’s Friday, and guess who’s back? Hope you all had a fantastic week! This week, let’s dive into…

    2 条评论
  • Building a RAG System using Gemini API

    Building a RAG System using Gemini API

    Welcome to the first episode of AI Weekly with Krithi! In this series, we’ll explore various AI topics, tools, and…

    3 条评论
  • Evaluation methods for LLMs

    Evaluation methods for LLMs

    Hey all, Welcome back for the sixth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

  • Different Fine-tuning Methods for LLMs

    Different Fine-tuning Methods for LLMs

    Hey all, Welcome back for the fifth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

    1 条评论
  • Pretraining and Fine Tuning LLMs

    Pretraining and Fine Tuning LLMs

    Hey all, Welcome back for the fourth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

  • Architecting Large Language Models

    Architecting Large Language Models

    Hey all, Welcome back for the third Episode of Cup of Coffee Series with LLMs. Again we have Mr.

  • LLMs #2

    LLMs #2

    Hey all, Welcome back for the second Episode of Cup of Coffee Series with LLMs. Again we have Mr.

    2 条评论
  • LLM's Introduction

    LLM's Introduction

    Hello Everyone! Kiruthika here, after a long. I am back with the cup of coffee series with LLMs.

    2 条评论
  • Transformers

    Transformers

    Hello, folks! Kiruthika is back after a long break. Yep, let's get started with our Cup of Coffee Series! Today, we…

    4 条评论
  • Generative Adversarial Network (GAN)

    Generative Adversarial Network (GAN)

    ??????Pour yourself a virtual cup of coffee with GANs after a long. Finally, we are stepping into 19 th week of this…

    1 条评论

社区洞察

其他会员也浏览了