UMAP: If Geometry Could Talk

In a thought experiment recently, I was thinking about various frameworks of symmetry and asymmetry and how abstract states within a system give rise to symmetry. In this context, I aim to map primarily UMAP and to some extent t-SNE to these frameworks, examining them from the perspectives of symmetry, geometry, analogy, and mathematics to better connect their underlying principles.

UMAP (Paper reference: 1802.03426) can be viewed as using non-abstract structured symmetry by maintaining graph structures between high and low-dimensional spaces. This approach ensures the preservation of both local and global data relationships, making UMAP more effective for practical applications that require a balanced representation of data.

In contrast, t-SNE utilizes an abstract symmetry approach by aligning probability distributions that emphasize local similarities, often at the expense of global structure preservation. While both techniques adjust point positions in space to reflect data relationships, their underlying methodologies differ significantly in how they handle the balance between local and global structure.

While t-SNE focuses on probabilistic similarities to project higher-dimensional symmetry into lower dimensions, UMAP employs a more direct approach by constructing and preserving graph-like structures in both higher and lower dimensions.

You can imagine them as a ball that can be squeezed or expanded in both high and low dimensions. The squeezing effect can be thought of as an optimization process in the graph structure, rather than the literal squeezing of a ball.

Why graphs are important and how they make this process better than t-SNE?

Graphs have nodes, which represent data points, and edges that represent connections or relationships between these points. These nodes are interconnected in higher dimensions. When projecting the data into lower dimensions, the relationships shift to adjust for additional context. For practical purposes, you can imagine these points as being randomly placed initially. Since the connections between points in the higher-dimensional space are known, UMAP retains these pairwise relationships and their strengths in the lower-dimensional space.

During the optimization process, points in the lower dimensions are iteratively moved to better reflect their original structure in the higher dimensions.

You can think of it in another way: In high-dimensional space, connections between points can exist whether they are close or far apart, depending on the strength of their association. In t-SNE, these connections exist in the higher dimension, but in the lower dimension, some points may move apart. While some connections are maintained through the heavy-tail distribution, others may be lost. This causes local structures to be preserved, but global structures to become distorted.

In contrast, UMAP’s graph-based approach ensures that both global and local structures are preserved in the lower dimension.

Enough of the description, what about the math?

This time, I will index it the old fashion way of each step to see if this makes it more readable.

1.????? Construct the graph in high dimension

First step is to calculate the Euclidean distance between all points in the space using following. You can reference this in most of the articles I have written.

Once you have calculated the distance, you want to find out the nearest neighbors to every point in the space which is governed by the Euclidean distances between points (K- nearest neighbors).

It’s fine to know the Euclidean distance between points, but that doesn’t fully explain the strength of the connection between them. A simple assumption might be that closer points have stronger connections, but this isn’t always true because strength is a relative phenomenon.

To understand the formula below, imagine all the points in a 2-dimensional space. Instead of interpreting absolute distances, you treat distances as relative by scaling them based on the smallest distance. This gives a sense of the relative connection strength. Additionally, normalizing these strengths to a range (e.g., 0 to 1) ensures the results are easier to interpret and consistent.

And the final factor is the ability to adjust the embedding to grow or shrink dense and sparse regions, as illustrated in the picture below. Dense regions in the high-dimensional space are expanded in the lower-dimensional embedding to prevent overcrowding and preserve local relationships, while sparse regions may be contracted to maintain a balanced representation of the data’s structure.

Now that we understand the imagination behind it, let’s examine each term in the math. dist(i,j) is the Euclidean distance between two points, which is subtracted from the smallest Euclidean distance (ρi) to calculate the relative distance or strength between points.

?However, certain regions might be too sparse or too dense, leading to situations where connections are disproportionately weak or strong. This could distort the true contextual relationships, as the connection strength may reflect the local density of points rather than their actual associations.

If we could dynamically determine a value to normalize the connections, making dense regions a bit sparser and sparse regions a bit denser, it would improve the balance. This is achieved through σ, a dynamic controlling factor that adapts based on the density of the region.

The last aspect of the function involves evaluating a negative term with an exponential. The exponential of a negative term ensures the output of the function is bound between 0 and 1. This normalization makes the strengths easier to interpret. You can observe the effect of the exponential values below.

As the points get relatively closer, the strength increases, and it decreases as they become sparser. Finally, we calculate the strengths between all the points in a region, effectively constructing a fuzzy graph.

The next step involves establishing weighted connections between points using the formula:

Like we saw in t-SNE, in UMAP for each point, the canvas is different, so the relative distances from i to j and j to i will differ based on the reference distance and the dynamic controlling factor (which adjusts the canvas size). These terms are summed, but summing them directly would make the output larger than 1, and we still want to keep it within the range of 0 to 1.

To achieve this, we subtract the product of both terms from their sum. This is interesting because, in the general case, we might have simply taken the mean (by dividing by 2) to reduce the value within 1. However, doing so would not preserve the relative strength of each term. Instead, the subtraction of the product introduces a non-linear adjustment, dynamically balancing the weight based on the relative strengths of the terms. This is more adaptive than simply taking the mean.

Now imagine if the canvases are shrinking and expanding dynamically at a specific rate of change. Could we adjust for this dynamic rate of change as well? This is an experimental idea derived from the figment of my imagination, considering that both sides of the canvas change at a specific rate. The resulting mathematical expression could look something like the one below.

I don’t want to complicate any further, so let’s move to the lower dimension math.

2.????? Initialize the points in low dimension

Initially, each point in the low-dimensional space is randomly connected to another point. The weights between these connections are defined using the following equation.

Here, dij is the Euclidean distance between two points in space. There are two scaling factors, a and b. Think of a as a starting factor to diminish or enhance the weight at the beginning, and b as controlling the exponential scaling. Since this value is an inverse, the larger the distance between points, the faster the connection strength drops. Both a and b are calculated using hyperparameters. This is a generic setup to begin with, but we would be gradually bringing lot more clarity to this.

3.????? Evolve the graph in the low dimension

We know the real weights between points in the higher dimension, and now we have the weights for similar connections in the lower dimension. If we can iteratively update the weights in the lower dimension to bring them closer to the high-dimensional weights, we can preserve both the local cluster structure and the global structure.

For this, we rely on the cross-entropy loss function and gradient descent, concepts borrowed from neural networks. Since we aim to adjust weights, the cross-entropy function here should depend on the weights.

I will not go deeper into this, as you can reference the cross-entropy function in another article I have written. It measures the loss of information in the weights of the lower dimension compared to the reference weights in the higher dimension for the entire set of points.

Similar to what we do in neural networks, once we have the loss, we calculate how it changes with respect to the weights. Using the chain rule, we compute the gradient of the loss for a specific point in space. These gradient forms the basis of gradient descent: the first step is calculating the gradient, and the second is applying gradient descent.

Points are updated, and the weightage of the entire system in the low dimension is recalculated. This back-and-forth process continues until most of the loss is minimized. The resulting structure in the low dimension preserves both local and global structures.

The verdict is that UMAP is a much better tool than t-SNE for visualization. However, you can use PCA to reduce the dimensionality by retaining only the principal components before creating the visualization. This is especially useful for datasets with a very high number of features. The higher the number of features, the more computationally expensive it becomes to construct the graphs. Therefore, for datasets with very high dimensionality, it makes sense to first reduce dimensionality using PCA before applying UMAP.

As i end this article, keeping Richard Feynman’s thinking alive from the last article, the name of the bird is “Uniform Manifold Approximation and Projection”.

要查看或添加评论,请登录

Himanshu S.的更多文章

社区洞察

其他会员也浏览了