BxD Primer Series: Affinity Propagation Clustering Models

BxD Primer Series: Affinity Propagation Clustering Models

Hey there ??

Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Affinity Propagation Clustering Models. Let’s get started:

The What:

The main idea behind Affinity Propagation Clustering is that each data point sends messages to all other data points to indicate how well-suited it is to represent a cluster center. The messages sent between data points are based on two factors: the similarity between the data points, and the availability of other data points to serve as cluster centers.

To determine the initial cluster centers, Affinity Propagation Clustering uses a similarity matrix that measures the similarity between each pair of data points. Once the similarity matrix is established, the algorithm starts iterating between two steps: "responsibility" and "availability".

In the responsibility step, each data point calculates a responsibility value that measures?how well-suited it is to serve as the cluster center?for each other data point.

In the availability step, each data point calculates an availability value that measures?how available it is to be a cluster center?based on the responsibilities sent by other data points.

The iterations continue until the algorithm converges and identifies the cluster centers. Data points are then assigned to these centers based on their similarity to center points.

The Why:

Here are a few advantages that Affinity Propagation Clustering provides over other clustering models:

  1. Automatically determine the number of clusters without the need for a predefined value
  2. Able to handle non-linear and complex data patterns
  3. Suitable for high-dimensional datasets with a large number of features
  4. Able to identify small clusters and outliers that may be missed by other clustering algorithms
  5. Affinity Propagation can handle missing data by ignoring those points in the clustering process

The Why Not:

Reasons to not use Affinity Propagation:

  1. Requires multiple iterations and can take a long time to converge
  2. Requires tuning of several hyper-parameters, which can be difficult and time-consuming
  3. Affinity Propagation may result in unevenly sized clusters
  4. Affinity Propagation Clustering produces clusters that are based on similarity metrics and message passing, which can be difficult to interpret

The How:

There are three matrix involved in Affinity Propagation algorithm: Similarity (S), Responsibility (R), Availability (A). If there are n data points in dataset, all these matrices will be of n*n dimension.

Similarity Matrix?stores pairwise similarities between all data points using a chosen distance or similarity metric. There are multiple ways to measure the distance between points and it depends on the values of features. We have explained their formula and implications in another edition (check?here). Brief below:

  1. Euclidean distance?is a popular choice for datasets with continuous numerical features.
  2. Manhattan distance?is useful for datasets with discrete or categorical features.
  3. Minkowski distance?is generalized distance metric that combines Euclidean and Manhattan.
  4. Cosine similarity?is commonly used for text data, where the similarity between two documents is based on the angle between their feature vectors.
  5. Jaccard similarity?is useful for datasets with binary features, such as presence/absence of certain attributes.

Responsibility Matrix?element R_{i,k} represents the extent to which data point x_i considers exemplar x_k to be a good representative of the data

Availability Matrix?element A_{i,k} represents the extent to which exemplar x_k considers data point x_i to be a good candidate for being a member of its cluster.


The algorithm works as follows:

Step 1: Initialize the responsibility and availability matrices to zero, and the self availability to a small number:

No alt text provided for this image

Where,?λ?is the damping factor that controls the?rate?at which availability matrix is updated during each iteration. A higher value leads to slower and more stable convergence, while a lower damping value leads to faster and more volatile convergence.

Step 2: Compute the responsibility R_{i,k} for each data point x_i and exemplar x_k as:

No alt text provided for this image

Where,

  • R(i, k) represents the responsibility of data point i to exemplar k
  • S(i, k) is the similarity between data point i and exemplar k
  • A(i, k) is the availability of exemplar k to data point i.

Step 3: Compute the availability of all data points as:

No alt text provided for this image

Where,

  • A(k, k) represents the availability of exemplar k to itself
  • A(i, k) represents the availability of exemplar k to data point i

Step 4: Update the exemplars as the data points with highest net responsibility (responsibility minus availability):

Step 5: Repeat steps 2-4 until convergence, which is defined as the point when responsibility and availability matrices do not change significantly.

Step 6: Assign each data point to its closest exemplar based on the similarity matrix.

Note:?The value of?k?will change over time, ranging from 0 to?n, where?n?is number of data points.


Convergence Criterion?is typically based on the change in exemplar set and cluster assignments between iterations. The algorithm terminates when the change in exemplar set and cluster assignments falls below a certain threshold, indicating that the algorithm has converged to a stable solution.

Change in exemplar set: Measures the percentage of exemplars that are added or removed between two consecutive iterations, and compares it to the threshold value. If the percentage is less than the threshold value, the algorithm is considered to have converged.

Change in cluster assignment: Measures the percentage of data points whose cluster assignment changes between two consecutive iterations, and compares it to the threshold value. If the percentage is less than the threshold value, the algorithm is considered to have converged.


Evaluating Clusters: Eventually, you will need to accept/reject the clustering results. Certain methods can be used to judge quality of cluster. We have covered their formula and implications in another edition (check?here). Brief below:

  1. Silhouette Coefficient?measures the compactness and separation of clusters. It is useful when the data is well-separated and the clusters have similar sizes.
  2. Davies-Bouldin Index: This metric measures the similarity between clusters and the distance between their centroids. It is useful when the clusters have different sizes and densities.
  3. Calinski-Harabasz Index: This metric measures the ratio of the between-cluster variance to the within-cluster variance. It is useful when the clusters are well-separated and have similar sizes.

Time for you to support:

  1. Reply to this email with your question
  2. Forward/Share to a friend who can benefit from this
  3. Chat on Substack with BxD (here)
  4. Engage with BxD on LinkedIN (here)

In next coming posts, we will be covering two more types of clustering models - Mean-Shift, Fuzzy C-Means.

Let us know your feedback!

Until then,

Have a great time! ??

#businessxdata?#bxd?#affinity #propagation?#clustering?#primer

要查看或添加评论,请登录

Mayank K.的更多文章

  • What we look for in new recruits?

    What we look for in new recruits?

    Personalization is the #1 use case of most of AI technology (including Generative AI, Knowledge Graphs…

  • 500+ Enrollments, ?????????? Ratings and a Podcast

    500+ Enrollments, ?????????? Ratings and a Podcast

    We are all in for AI Driven Marketing Personalization. This is the niche where we want to build this business.

  • What you mean 'Build A Business'?

    What you mean 'Build A Business'?

    We are all in for AI Driven Personalization in Business. This is the niche where we want to build this business.

  • Why 'AI-Driven Personalization' niche?

    Why 'AI-Driven Personalization' niche?

    We are all in for AI Driven Personalization in Business. In fact, this is the niche where we want to build this…

  • Entering the next chapter of BxD

    Entering the next chapter of BxD

    We are all in for AI Driven Personalization in Business. And recently we created a course about it.

    1 条评论
  • We are ranking #1

    We are ranking #1

    We are all in for AI Driven Personalization in Business. And recently we created a course about it.

  • My favorites from the new release

    My favorites from the new release

    The Full version of BxD newsletter has a new home. Subscribe on LinkedIn: ?? https://www.

  • Many senior level jobs inside....

    Many senior level jobs inside....

    Hi friend - As you know, we recently completed 100 editions of this newsletter and I was the primary publisher so far…

  • People need more jobs and videos.

    People need more jobs and videos.

    From the 100th edition celebration survey conducted last week- one point is standing out that people need more jobs and…

  • BxD Saturday Letter #202425

    BxD Saturday Letter #202425

    Please take 2 mins to send your feedback. Link: https://forms.

社区洞察

其他会员也浏览了