Why Multidimensional Scaling Fails?

Why Multidimensional Scaling Fails?

Why MDS Fails in giving us Meaningful Embeddings:

MDS basically arranges points in 2D or lower dimension based on high dimensional pairwise distances.

The objective loss which it follows is:

No alt text provided for this image

where dij is distances at original higher dimension and yi -yj is the distance pair at low dimension preferably 2D.

Preserving high dimensional distances is usually a bad idea because it's not possible to preserve them (curse of dimensionality).

what's Curse of dimensionality Problem:

The common theme of these problems is that when the dimensionality increases, the?volume?of the space increases so fast that the available data become sparse. In order to obtain a reliable result, the amount of data needed often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient.

Now coming back to our topic

Let us see example of random generated data points from gaussian with unit variance and look at the distribution of their pairwise distances

No alt text provided for this image

As we are increasing the dimensions the distribution of pairwise distances are being shifting more away from the centre meaning the distances en up getting on higher ends.

So suppose if you want to have two points separated with large distances in 2D but in doing so there would be some points which are close together which have pairwise distances as zero which is not there at all in original dimension. So point here is you are trying to fit the green distribution with the blue one due to which MDS fails to produce meaningful embedding.

Embedding results on MNIST data seems pretty unseperable.

Embedding on Mnist data


Another issue with MDS is it's quadratic memory and time complexity.

So how it have been dealt is Instead to preserving distances preserve nearest neighbour.

To check more on the idea please take a look on below paper

https://www.cs.toronto.edu/~hinton/absps/sne.pdf

Hope you like it!

Thanks

要查看或添加评论,请登录

Rahul Kumar的更多文章

  • Ensemble Learning

    Ensemble Learning

    Ensemble learning is a machine learning paradigm where multiple learners are trained to solve the same problem. In…

  • Radial basis function network

    Radial basis function network

    If the classes or pattern are linearly separable then single layer Perceptron is sufficient otherwise we need to…

    4 条评论
  • Linear Discriminant Analysis

    Linear Discriminant Analysis

    Lets start with Limitations of Logistic Regression Logistic regression is a simple and powerful linear classification…

  • Resolving MERGE Performance in Azure SQL Database

    Resolving MERGE Performance in Azure SQL Database

    When merging large data-sets in Azure SQL Database its imperative to optimize our queries. Failure to do so will most…

    1 条评论
  • ORC vs RC file format

    ORC vs RC file format

    ORC offers a number of features not available in RC files: * Better encoding of data. Integer values are run length…

  • Partitioning clustered columnstore tables in Azure Sql Data-warehouse

    Partitioning clustered columnstore tables in Azure Sql Data-warehouse

    Partitioning can be used to improve performance some scenarios, creating a table with too many partitions can hurt…

  • Best Practices for Azure Sql data warehouse Data Load using polybase or single-client gated load methods

    Best Practices for Azure Sql data warehouse Data Load using polybase or single-client gated load methods

    *Best practices and considerations when using PolyBase Here are a few more things to consider when using PolyBase for…

  • NoSql Database Modelling Challenges

    NoSql Database Modelling Challenges

    *Modelling Challenge: Related data to embed or reference related data?? related data: embedding Considerations: 1)data…

社区洞察

其他会员也浏览了