Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Spectral Clustering Models. Let’s get started:
The What:
Spectral clustering is a powerful technique for clustering data that is not easily separated by linear boundaries and other clustering algorithms. Instead of clustering the original dataset, it looks at the underlying structure of data in terms of similarity between data points. Then it uses that underlying structure to arrive at clusters.
It's called spectral clustering because it relies on the spectral properties (eigenvectors and eigenvalues) of a similarity graph that is constructed from the data. In summary:
Spectral Clustering = Reduced Underlying Structure + Any other Clustering Algorithm
The Accuracy:
You can observe from below image that Spectral Clustering is able to separate almost all data distribution types into appropriate clusters. Whereas some other algorithm can fail to do so on some particular data distributions.
The How:
There are eight general steps to perform spectral clustering. Multiple choices can be made at each step. Here are the eight steps and choices will be explained separately:
- Choose a pairwise similarity measure or distance metric to quantify the similarity or dissimilarity between each pair of data points.
- Construct an affinity matrix that captures the pairwise relationships between data points based on the similarity measure or distance metric chosen.
- Normalize the affinity matrix to obtain a matrix that satisfies certain properties, such as being symmetric, positive semi-definite, and having rows that sum to one.
- Compute the eigenvectors and eigenvalues of the normalized affinity matrix.
- Choose a small subset of the top eigenvectors based on the desired number of clusters or other criteria.
- Use the selected eigenvectors to form a lower-dimensional representation of the data points.
- Apply a clustering algorithm, such as k-means or hierarchical clustering, to the lower-dimensional representation to group the data points into clusters.
- Map the cluster assignments back to the original data space to obtain the final clustering result.
??Choice for pairwise similarity measure or distance metric?will depend on the nature of data:
- Euclidean distance: It measures the distance between two points in a straight line.
- Cosine similarity: This measures the angle between two vectors, which is useful for text or image data.
- Correlation coefficient: This measures the linear correlation between two vectors.
- Gaussian kernel similarity: This is a similarity measure that assigns a weight to each pair of points based on their distance.
- Pearson correlation: This measures the linear correlation between two vectors after they have been centered.
- k-Nearest Neighbors graph: Connects each point to its k nearest neighbors based on the chosen distance metric.
- Fully connected graph: Connects each point to all other points based on the chosen distance metric.
- Laplacian kernel: This is a kernel-based approach that uses the Laplacian matrix to measure similarity between points.
??Affinity matrix?depends on choice of similarity or distance metric:
Each row and column of the affinity matrix corresponds to a single data point, and the value of each entry in the matrix represents the degree of similarity or dissimilarity between the corresponding pair of data points.
Depending on choice of similarity or distance measure, there are three types of affinity matrix:
- Binary adjacency matrix, where entries are 1 if two points are connected, and 0 otherwise
- Weighted adjacency matrix, where entries are weighted by the strength of the similarity or distance measure
- Gaussian similarity matrix, where entries are computed using a Gaussian kernel function
Affinity matrix is a key input to spectral clustering algorithms, as it is used to?construct the lower-dimensional representation of the data points.
??Normalizing the affinity matrix: Unnormalized affinity is rarely used to calculate eigenvectors as it can result in unstable and inconsistent clustering results.
It is normalized to ensure that?rows sum to 1, and the resulting matrix is then used for eigenvector computation.
Row normalization: Dividing each row of affinity matrix by sum of the elements in that row.
Symmetric normalization: Symmetry ensures that eigenvectors of the matrix are orthogonal.
- Compute a diagonal matrix (D) whose diagonal entries are sum of the elements in that row of affinity matrix (A).
- Compute the unnormalized Laplacian matrix L: L = D - A.
- Compute the normalized Laplacian matrix L_sym:
- The resulting matrix L_sym is symmetric and has positive eigenvalues.
Random walk normalization?also provides a symmetric matrix. Symmetric normalization takes a global approach, while random walk normalization takes a local approach. It is better suited in scenarios where dataset have disconnected cluster locations.
- Compute a diagonal matrix (D) whose diagonal entries are sum of the elements in that row of affinity matrix (A).
- Compute the normalized affinity matrix L_rw:
- Each row of L_rw represents a probability distribution over the neighboring nodes.
??Computing Eigenvectors and Eigenvalues:?Eigenvectors are the vectors that when multiplied by a matrix,?produce a scalar multiple of the original vector. Eigenvalues are the corresponding scalar multiples. Details will be covered in PCA primer.
Whole idea of spectral clustering is based on use of eigenvectors and eigenvalues instead of original features and they are the ‘spectral’ property of data. They are used to extract most of information from data into less number of features effectively performing dimensionality reduction.
The eigenvectors represent the directions in which the data is most separable, while the eigenvalues correspond to the degree of influence or importance of each direction.
??Selecting a subset of (m)?eigenvectors: Common options when selecting a subset of top eigenvectors:
- Fixed number of eigenvectors: Choose a fixed number of top eigenvectors based on the desired number of clusters or other criteria. This is the simplest approach, but it may not always lead to the best clustering result.
- Ratio of eigenvalues:?Choose the top eigenvectors based on a ratio of their corresponding eigenvalues. For example, you can choose the eigenvectors that correspond to the top 10% of eigenvalues. This approach aims to retain the eigenvectors that capture the most amount of variance in data.
- Gap statistic:?Compare the eigenvalues of the normalized affinity matrix to the eigenvalues of random matrices generated with the same dimensions, and identify a "gap" in the eigenvalue spectrum that indicates the optimal number of eigenvectors to pick.
??Forming a lower-dimensional representation of data:
- Form a matrix X by stacking the original data points as rows. Dimension of X: p*n
- Form a matrix V by stacking the selected eigenvectors as columns. Dimension of V: n*k
- Compute the matrix product Y = XV. Dimension of Y: p*k
- The rows of Y represent the lower-dimensional embedding of the original data points.
??Apply a clustering algorithm on lower-dimensional data:
By this step, ‘spectral’ part of algorithm is done. Now, a suitable clustering algorithm can be applied to group lower dimension data points into clusters. Some common choices:
- K-means clustering: This is a popular choice for spectral clustering. We have already covered this algorithm in our previous edition (check?here).
- Hierarchical clustering: This is another popular choice for spectral clustering. We have already covered agglomerative algorithm in our previous edition (check?here).
- DBSCAN: We have already covered this algorithm in our previous edition (check?here).
??Mapping lower-dimensional clusters to original data points:
Options for mapping the cluster assignments from the lower-dimensional space back to the original data space:
- One-to-One Map: Inversely map the embedding to original data point and assign it to the cluster assignment of embedding.
- Nearest neighbor: For each data point in the original space, find its nearest neighbor in the lower-dimensional space. Assign the data point to the same cluster as its nearest neighbor in the lower-dimensional space.
Evaluating Clusters:
Spectral clustering algorithm has multiple parameters that can affect the resulting cluster formation and there are no direct methods to visualize the clustering results.
Eventually, you will need to accept/reject the clustering results. Certain statistical methods can be used to judge quality of cluster. We have covered their formula and implications in a previous edition (check?here).
Another approach is to use a measure of clustering stability. Some common techniques:
- Original dataset is sampled with replacement to create multiple bootstrap datasets.
- Each bootstrap dataset is then clustered using spectral clustering, and the resulting clustering solutions are compared using a stability metric such as Normalized Mutual Information (NMI)
- NMI ranges from 0 to 1, with a value of 1 indicating perfect alignment between the two solutions and a value of 0 indicating no alignment.
- The stability metric is then averaged across all bootstrap samples to produce an overall measure of stability.
- A random subset of the original dataset is selected, and spectral clustering is performed on the subset.
- This process is repeated multiple times with different subsets of the data, and resulting clustering solutions are compared to assess their stability using a stability metric
- Clustering algorithm is run multiple times with small perturbations to the parameters.
- Resulting clustering solutions are compared using a stability metric to identify the most stable solution.
- Multiple clustering algorithms are run with different parameter settings or initialization.
- The resulting clustering solutions are then combined to produce a consensus clustering solution that represents the most stable clusters across all the individual solutions.
The Why:
There are several reasons why one might choose spectral clustering over other clustering models:
- Can handle datasets with complex or nonlinear structures, where other clustering algorithms struggle
- Can identify clusters of arbitrary shapes and provide clear separation
- Generally produces high-quality clusters, especially when the underlying data has clear geometric structure
- Can be used with a variety of similarity measures, including both graph-based and distance-based measures
- Can be used with both normalized and unnormalized Laplacians, allowing for more flexibility in the algorithm's behavior
- Embeddings of data points in the low-dimensional space can be used as features for downstream tasks
The Why Not:
Some reasons why you might not want to use spectral clustering:
- Can be computationally expensive, especially for large datasets or high-dimensional feature spaces
- Requires selection and tuning of several parameters, such as the similarity function, affinity matrix type and normalization techniques
- May not perform well on datasets with low-dimensional structure or when the clusters are highly overlapping or dispersed
Time for you to help:
- Reply to this email with your question
- Forward/Share to a friend who can benefit from this
- Chat on Substack with BxD (here)
- Engage with BxD on LinkedIN (here)
In next coming posts, we will be covering three more types of clustering models - Affinity Propagation, Mean-Shift, Fuzzy C-Means.
Let us know your feedback!
Service Introduction Manager - Cloud & Compute at Alstom
1 年Hi Mayank I want to let you know that your posts are very informative and easy to understand even for someone who has little or no relevance to this domain. Keep up the good work ??