登录查看更多内容

Hierarchial Clustering

Kiruthika Subramani

Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA

发布日期: 2023年6月26日

?????? It's time for another "Cup of coffee with an Algorithm in ML"! ???? This week, we're diving into the intriguing world of Hierarchical Clustering! ???? Grab your favorite cup of coffee ??and join us as we unravel the mysteries of clustering, discover how to group similar data points together, and create a hierarchical structure of clusters. Get ready for an exciting journey into the depths of Hierarchical Clustering! ?? Let's begin!

Hierarchical clustering is a method that helps us group similar things together by creating a tree-like structure.

At the beginning, each item is considered as its own cluster. Then, the algorithm looks at how similar, each item is to one another and starts merging the most similar items together into clusters. It keeps doing this, merging more and more items based on their similarities, until you have all the items in one big cluster.

Like this

However, the interesting part is that hierarchical clustering doesn't stop there. It continues to create a hierarchy by splitting the big cluster into smaller clusters, and then further splitting those smaller clusters into even more smaller groups, if desired.

The hierachial Clustering is of two types. One is Agglomerative Clustering and another one is divisive Clustering

Agglomerative and divisive clustering represent two different strategies for building a hierarchical structure of clusters. Agglomerative clustering starts from the bottom and merges clusters, while divisive clustering starts from the top and splits.

Terminologies of Hierarchial Clustering

Dendrogram: A tree-like diagram that shows how clusters are merged or split in hierarchical clustering.

Linkage: Method used to calculate the distance or similarity between clusters.
Cluster Distance: Measure of dissimilarity or similarity between clusters.
Cluster Fusion: Merging two or more clusters into a single cluster.
Cluster Splitting: Dividing a cluster into smaller clusters.
Cut-off Threshold: Value used to determine the number of clusters by cutting the dendrogram at a certain height or distance.
Number of Clusters: Desired or determined number of distinct groups formed from the data.
Silhouette Coefficient: Measure of how well data points fit into their assigned clusters, taking into account cohesion and separation.

领英推荐

Why Data Visualization is Crucial in Modern Data…

Naresh i Technologies 2 个月前

Basic Building Blocks of K-Means Clustering Algorithms

Harry Thapa 1 年前

Types of Clustering Methods

Shashank Sharma 2 年前

Wait, you all might have this question, How Hierarchial Clustering differs from K Means clustering?

Hierarchical clustering builds a hierarchy of clusters by merging or splitting based on similarities, while K-means clustering aims to partition data points into a pre-defined number of clusters based on their proximity to cluster centroids.

Come on Let's implement Agglomerative Clustering!

import pandas as pd
from sklearn.datasets import load_iris
from scipy.cluster.hierarchy import linkage, dendrogram, fcluster
import matplotlib.pyplot as plt


# Load the Iris dataset
iris = load_iris()
data = iris.data
columns = iris.feature_names


# Convert the data array into a DataFrame
df = pd.DataFrame(data, columns=columns)


# Calculate the linkage matrix using the complete linkage method
linkage_matrix = linkage(df.values, method='complete')


# Plot the dendrogram
plt.figure(figsize=(10, 6))
dendrogram(linkage_matrix)
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Sample Index')
plt.ylabel('Distance')
plt.show()


# Cut the dendrogram to obtain clusters
k = 3? # Desired number of clusters
clusters = fcluster(linkage_matrix, k, criterion='maxclust')


# Add the cluster labels to the original DataFrame
df['Cluster'] = clusters


print(df)

Divisive Clustering

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.cluster import AgglomerativeClustering


# Load the Iris dataset
iris = load_iris()
data = iris.data


# Convert the data into a DataFrame
df = pd.DataFrame(data, columns=iris.feature_names)


# Perform divisive clustering
n_clusters = 3? # Number of clusters
model = AgglomerativeClustering(n_clusters=n_clusters, linkage='average')
clusters = model.fit_predict(df)


# Add the cluster labels to the DataFrame
df['Cluster'] = clusters


# Print the cluster assignments
print(df['Cluster'].value_counts())

What's the difference in the code, both seems to be same???

The key distinction is the choice of linkage method, which determines the clustering behavior.

Hope you got it!!

???? We've successfully explored and implemented Hierarchical Clustering algorithm to group customers based on their purchasing behavior, demographics, and preferences. ????? Now, let's gather over a cup of coffee next week to delve into another fascinating algorithm in Machine Learning. ??

Cheers,

Kiruthika.

Praveen Kumar Chidambaram

Student at Anna University Regional Campus Madurai

1 年

Well said KIRUTHIKA S ,doing great job. ??

2 次回应

Sabareesh Ramasamy

1 年

Thanks for sharing.??

1 次回应

Anbarasu P

Relationship Manager

1 年

Helpful! This will

1 次回应

查看更多评论

要查看或添加评论，请登录

Kiruthika Subramani的更多文章

RAG System with Video

2024年9月13日

RAG System with Video

Hello Everyone,It’s Friday, and guess who’s back? Hope you all had a fantastic week! This week, let’s dive into…

2 条评论
Building a RAG System using Gemini API

2024年9月6日

Building a RAG System using Gemini API

Welcome to the first episode of AI Weekly with Krithi! In this series, we’ll explore various AI topics, tools, and…

3 条评论
Evaluation methods for LLMs

2024年5月22日

Evaluation methods for LLMs

Hey all, Welcome back for the sixth Episode of Cup of Coffee Series with LLMs. Again we have Mr.
Different Fine-tuning Methods for LLMs

2024年5月10日

Different Fine-tuning Methods for LLMs

Hey all, Welcome back for the fifth Episode of Cup of Coffee Series with LLMs. Again we have Mr.

1 条评论
Pretraining and Fine Tuning LLMs

2024年5月5日

Pretraining and Fine Tuning LLMs

Hey all, Welcome back for the fourth Episode of Cup of Coffee Series with LLMs. Again we have Mr.
Architecting Large Language Models

2024年5月2日

Architecting Large Language Models

Hey all, Welcome back for the third Episode of Cup of Coffee Series with LLMs. Again we have Mr.
LLMs #2

2024年4月29日

LLMs #2

Hey all, Welcome back for the second Episode of Cup of Coffee Series with LLMs. Again we have Mr.

2 条评论
LLM's Introduction

2024年4月26日

LLM's Introduction

Hello Everyone! Kiruthika here, after a long. I am back with the cup of coffee series with LLMs.

2 条评论
Transformers

2023年12月25日

Transformers

Hello, folks! Kiruthika is back after a long break. Yep, let's get started with our Cup of Coffee Series! Today, we…

4 条评论
Generative Adversarial Network (GAN)

2023年10月24日

Generative Adversarial Network (GAN)

??????Pour yourself a virtual cup of coffee with GANs after a long. Finally, we are stepping into 19 th week of this…

1 条评论

See all articles

Hierarchial Clustering

Kiruthika Subramani

Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA

领英推荐

Kiruthika Subramani的更多文章

社区洞察

其他会员也浏览了

Data Science, Big Data, Data Analytics

Understanding IQR (Interquartile Range) in Data Science A Comprehensive Guide

Why Use Variance and Standard Deviation in Data Science: Understanding Measures of Dispersion

Quick Review: Model Development and Business Metric Evaluation using Sci-Kit Learn

Scatter Charts in Focus — A Comprehensive Guide to Effective Visualization

Exploratory Data Analysis - Critical step for AI / ML based solution

Dendrograms in Data Science: A Comprehensive Overview

K-Means Clustering - Use Cases

Introduction to K-Means Clustering

Modeling in Data Science

领英推荐

Kiruthika Subramani的更多文章

RAG System with Video

Building a RAG System using Gemini API

Evaluation methods for LLMs

Different Fine-tuning Methods for LLMs

Pretraining and Fine Tuning LLMs

Architecting Large Language Models

LLMs #2

LLM's Introduction

Transformers

Generative Adversarial Network (GAN)

社区洞察

其他会员也浏览了

Data Science, Big Data, Data Analytics

Understanding IQR (Interquartile Range) in Data Science A Comprehensive Guide

Why Use Variance and Standard Deviation in Data Science: Understanding Measures of Dispersion

Quick Review: Model Development and Business Metric Evaluation using Sci-Kit Learn

Scatter Charts in Focus — A Comprehensive Guide to Effective Visualization

Exploratory Data Analysis - Critical step for AI / ML based solution

Dendrograms in Data Science: A Comprehensive Overview

K-Means Clustering - Use Cases

Introduction to K-Means Clustering

Modeling in Data Science