"RMT"?, or how high dimensionality is not so weird and not so cursed

"RMT", or how high dimensionality is not so weird and not so cursed

One of the first things you learn when you start digging a bit deep into machine learning is that "high dimensionality is weird". And indeed it seems so: things behave in a very different way in low versus high dimensional spaces. Unfortunately, our intuitions (about vector spaces, about algorithms, about numerical errors, etc.) come from low dimensional settings, whereas our data live, most often, in high dimensional spaces, and hence the "weirdness".

A common phrase to express this phenomenon is "the curse of dimensionality". That is a very vague concept indeed: I've interviewed many data scientists over the years, and "what is the curse of dimensionality" is a question you get very different answers almost all the time (I've also been interviewed as data scientist several times, and that is a question I'm not too sure how to answer, since I have the feeling my interviewer might have a different "view" of the topic than the one I have... but that's a whole different story!). There are indeed many different phenomena that fall under that phrase, and one of the few things they have in common is that they are somehow related to the fact that all distances between data points tend to become equal as the dimensionality of the space grows (which is a rather counter-intuitive fact in itself, but not so hard to rigorously prove given some fairly lax conditions on the nature of the data)

Another thing that you learn about machine learning, probably much sooner than high dimensionality concerns, is that "everything is a matrix" and lineal algebra is at the heart of everything that you do (as usual, there is an xkcd comic that explains it better). So... high dimensionality, matrices everywhere... that means large matrices... if only there was some way to understand large matrices algebra better...

Here comes "Random Matrix Theory", which is a highly undervalued and awfully named branch of mathematics. I've been studying it lately and trying to grasp the key ideas, after I saw it showing up in several interesting papers I read this summer, and the more I learn, the more I feel it should receive much more attention from our community (just like "Stochastic Processes", which is actually closely related, or "Algebraic Geometry", which is also related, or... the list goes on an on. If you thought the list of math topics you need to master to be proficient in machine learning is rather long, think again: it's actually much longer!)

First of all, let me explain why I feel that Random Matrix Theory is awfully named. After all, it studies with the properties of... random matrices, isn't it? Well, yes and no. A more accurate description would be that it studies the "macroscopic" behavior (that is, the statistics) of a very large matrices and linear algebra. For those matrices, things like singular values or eigenvectors or limits of infinite series or many others (the kind of things that drive the behavior of algorithms) are themselves large collections of values, so you might want to "summarize" them somehow, studying things like their average metrics, possible boundaries, their distribution, etc. And to do so, you might want to try to relate those statistics with the statistics of the coefficients and the type of that matrix (a triangular matrix with Bernoulli distributed coefficients, a Hermitian matrix with normally distributed coefficients, etc. etc.). That is exactly what RMT does, but those matrices are not "random" in any meaning way, but rather they are "characterized by a parsimonious statistical description". They are just "typical" of whatever type of matrices we are interested on. So RMT is not about randomness, but about what is the "typical", and thus "expected", behavior of very large matrices.

Random Matrix Theory is relatively new: if you are reasonably young, it's fundamental results might have been available by the time you studied probability and lineal algebra, but not by the time your teachers studied them... and that is often what makes the difference between a "core" topic and a "exotic" topic. It is a very active field of research, and it is becoming "practical" in many different applications far from the ones that gave birth to the field (for many topics in mathematics, there is an inflection point when they are no longer just "something mathematically interesting" and they become "something that has practical value"; that is the moment when the interest and the rate of development increases rapidly). There are some fundamental results that make it a coherent topic, some interesting mathematical techniques (that I hadn't seen elsewhere, but might very well be more widely used) and at least two "surprises" (for me, that is, but probably similarly so to other data scientists like myself).

The first surprise is that there are well defined "phase transitions" in the spectral decomposition of random matrices of many types. That means, roughly speaking, that we might be able to know, a priori, the exact value of some parameter (for instance, the ratio between the dimensionality of a data set and the number of cases) at which that decomposition changes regime (for instance, when it becomes unbounded, or when it becomes chaotic, which might in term mean that some algorithm stops working or similar things). There are some interesting and quite illuminating applications of this kind of things to such well known algorithms as PCA and GMM clustering in this video (which I highly recommend), which shows how RMT is actually quite relevant and applicable.

The second surprise is the "universality" property. Once again roughly speaking, it means that the same limiting distributions appear in many different problems in RMT, and even more interesting, that in many relevant cases they depend only in the first two or the first four moments of the distribution of the coefficients of the matrix (meaning that in many cases they can be "normalized away"). In fact, the universality property plays a similar role in RMT (and in "large dimensionality" in general) to the one that the Central Limit Theorem plays in one dimension, so it is very relevant in trying to work out why "things are more regular than they should" and why "some algorithms work better than one could expect" (which is something we find out all the time in machine learning). Terence Tao explains it much much better than me in this video (I'm only trying here to motivate why RMT is significant for machine learning, not to present it with any rigor, of course)

So there are several reasons to believe that RMT is a useful tool that has a role to play in machine learning. But the real reason I think anyone seriously working on machine learning should learn at least some RMT is because it makes high dimensionality less mysterious. It takes away some of the weirdness, things that didn't make sense before start making sense, it gives you intuitions about how things work in high dimensional problems. "Intuitions", or as people that become managers call them, "less wasted time".

Juan Garbayo de Pablo

Data Scientist - MSc Aerospace Engineer

2 年

Thanks for sharing!

回复

要查看或添加评论,请登录

Jose Luis Hidalgo的更多文章

社区洞察

其他会员也浏览了