登录查看更多内容

"RMT", or how high dimensionality is not so weird and not so cursed

Jose Luis Hidalgo

It's (mostly) not my fault that AI will probably kill us all (let alone render us obsolete and irrelevant)

发布日期: 2022年8月24日

One of the first things you learn when you start digging a bit deep into machine learning is that "high dimensionality is weird". And indeed it seems so: things behave in a very different way in low versus high dimensional spaces. Unfortunately, our intuitions (about vector spaces, about algorithms, about numerical errors, etc.) come from low dimensional settings, whereas our data live, most often, in high dimensional spaces, and hence the "weirdness".

A common phrase to express this phenomenon is "the curse of dimensionality". That is a very vague concept indeed: I've interviewed many data scientists over the years, and "what is the curse of dimensionality" is a question you get very different answers almost all the time (I've also been interviewed as data scientist several times, and that is a question I'm not too sure how to answer, since I have the feeling my interviewer might have a different "view" of the topic than the one I have... but that's a whole different story!). There are indeed many different phenomena that fall under that phrase, and one of the few things they have in common is that they are somehow related to the fact that all distances between data points tend to become equal as the dimensionality of the space grows (which is a rather counter-intuitive fact in itself, but not so hard to rigorously prove given some fairly lax conditions on the nature of the data)

Another thing that you learn about machine learning, probably much sooner than high dimensionality concerns, is that "everything is a matrix" and lineal algebra is at the heart of everything that you do (as usual, there is an xkcd comic that explains it better). So... high dimensionality, matrices everywhere... that means large matrices... if only there was some way to understand large matrices algebra better...

Here comes "Random Matrix Theory", which is a highly undervalued and awfully named branch of mathematics. I've been studying it lately and trying to grasp the key ideas, after I saw it showing up in several interesting papers I read this summer, and the more I learn, the more I feel it should receive much more attention from our community (just like "Stochastic Processes", which is actually closely related, or "Algebraic Geometry", which is also related, or... the list goes on an on. If you thought the list of math topics you need to master to be proficient in machine learning is rather long, think again: it's actually much longer!)

First of all, let me explain why I feel that Random Matrix Theory is awfully named. After all, it studies with the properties of... random matrices, isn't it? Well, yes and no. A more accurate description would be that it studies the "macroscopic" behavior (that is, the statistics) of a very large matrices and linear algebra. For those matrices, things like singular values or eigenvectors or limits of infinite series or many others (the kind of things that drive the behavior of algorithms) are themselves large collections of values, so you might want to "summarize" them somehow, studying things like their average metrics, possible boundaries, their distribution, etc. And to do so, you might want to try to relate those statistics with the statistics of the coefficients and the type of that matrix (a triangular matrix with Bernoulli distributed coefficients, a Hermitian matrix with normally distributed coefficients, etc. etc.). That is exactly what RMT does, but those matrices are not "random" in any meaning way, but rather they are "characterized by a parsimonious statistical description". They are just "typical" of whatever type of matrices we are interested on. So RMT is not about randomness, but about what is the "typical", and thus "expected", behavior of very large matrices.

领英推荐

Fun with Graphing in Power BI - Part 3i

Greg Deckler 7 年前

Data Science Pipelines

Sampath Thrimavithana 8 个月前

Demystifying Gradient Descent: A Mathematics-Free…

Dotan Beck 1 年前

Random Matrix Theory is relatively new: if you are reasonably young, it's fundamental results might have been available by the time you studied probability and lineal algebra, but not by the time your teachers studied them... and that is often what makes the difference between a "core" topic and a "exotic" topic. It is a very active field of research, and it is becoming "practical" in many different applications far from the ones that gave birth to the field (for many topics in mathematics, there is an inflection point when they are no longer just "something mathematically interesting" and they become "something that has practical value"; that is the moment when the interest and the rate of development increases rapidly). There are some fundamental results that make it a coherent topic, some interesting mathematical techniques (that I hadn't seen elsewhere, but might very well be more widely used) and at least two "surprises" (for me, that is, but probably similarly so to other data scientists like myself).

The first surprise is that there are well defined "phase transitions" in the spectral decomposition of random matrices of many types. That means, roughly speaking, that we might be able to know, a priori, the exact value of some parameter (for instance, the ratio between the dimensionality of a data set and the number of cases) at which that decomposition changes regime (for instance, when it becomes unbounded, or when it becomes chaotic, which might in term mean that some algorithm stops working or similar things). There are some interesting and quite illuminating applications of this kind of things to such well known algorithms as PCA and GMM clustering in this video (which I highly recommend), which shows how RMT is actually quite relevant and applicable.

The second surprise is the "universality" property. Once again roughly speaking, it means that the same limiting distributions appear in many different problems in RMT, and even more interesting, that in many relevant cases they depend only in the first two or the first four moments of the distribution of the coefficients of the matrix (meaning that in many cases they can be "normalized away"). In fact, the universality property plays a similar role in RMT (and in "large dimensionality" in general) to the one that the Central Limit Theorem plays in one dimension, so it is very relevant in trying to work out why "things are more regular than they should" and why "some algorithms work better than one could expect" (which is something we find out all the time in machine learning). Terence Tao explains it much much better than me in this video (I'm only trying here to motivate why RMT is significant for machine learning, not to present it with any rigor, of course)

So there are several reasons to believe that RMT is a useful tool that has a role to play in machine learning. But the real reason I think anyone seriously working on machine learning should learn at least some RMT is because it makes high dimensionality less mysterious. It takes away some of the weirdness, things that didn't make sense before start making sense, it gives you intuitions about how things work in high dimensional problems. "Intuitions", or as people that become managers call them, "less wasted time".

Juan Garbayo de Pablo

Data Scientist - MSc Aerospace Engineer

2 年

Thanks for sharing!

要查看或添加评论，请登录

Jose Luis Hidalgo的更多文章

Let's get real, Europe: Atlanticism is over.

2025年2月17日

Let's get real, Europe: Atlanticism is over.

This is it. To say that these are the most critical days in international politics in the last fifty years is not an…
Category Theory for Artificial Intelligence

2022年12月23日

Category Theory for Artificial Intelligence

(The idea for this article came from these lectures that I cannot recommend enough, to anyone remotely interested in…
Is quantum computing having its "Theranos" moment?

2022年6月29日

Is quantum computing having its "Theranos" moment?

Warren Buffett famously said that he does not invest in anything he does not understand, and advised others to do the…

4 条评论
Geometric Deep Learning (a "5G" that might actually be relevant innovation)

2021年9月26日

Geometric Deep Learning (a "5G" that might actually be relevant innovation)

One very attractive feature of deep neural networks is what's called the "universal approximation property": a deep and…

1 条评论
The best of notebooks and the best of Excel, all in one?

2021年4月8日

The best of notebooks and the best of Excel, all in one?

TL;DR: Pluto is new a notebook-like environment for working with the Julia language. While still young, it provides…
Two mathematical topics that should (probably) be included in Data Science standard curriculum

2020年10月6日

Two mathematical topics that should (probably) be included in Data Science standard curriculum

Data Science is a relatively new specialization, even though most of it is rather old theories, tools and techniques…
LegalTech and technology culture

2020年7月28日

LegalTech and technology culture

I've been attending quite a few conferences about LegalTech lately, and as someone that has been in contact with…

2 条评论
GPT-3 can write code, and that might be a great thing for software developers

2020年7月27日

GPT-3 can write code, and that might be a great thing for software developers

In the last few days, there's been a lot of noise about GPT-3, the latest version of OpenAI model for natural language…
Why we need "humble AI"

2020年7月22日

Why we need "humble AI"

There is a famous science fiction short story by Fredric Brown in which some scientists create the ultimate…
Current status of neural differential equations and the importance of language

2020年1月1日

Current status of neural differential equations and the importance of language

NeurIPS, formerly known as NIPS, is the most important conference about artificial intelligence and machine learning…

See all articles

"RMT", or how high dimensionality is not so weird and not so cursed

Jose Luis Hidalgo

It's (mostly) not my fault that AI will probably kill us all (let alone render us obsolete and irrelevant)

领英推荐

Jose Luis Hidalgo的更多文章

社区洞察

其他会员也浏览了

Librarian or Farmer?: How Bayes' Theorem can give Analysts more perspective on solving problems

Non-Linear Parametric Modeling in R using Genetic Algorithm

k-nearest neighbors algorithm

Chapter 4: Decision Trees Algorithms with Math.

Chapter 1.2: Gradient Descent with Math.

Role of Linear Algebra in the Machine Learning?

Lessons in A.I. from a Budding Machine Learning Engineer — Getting to the Core — Part II

Simplifying Graphs and Intro to Iterative Graph Algorithms | Part 1

Topological Data Analysis and Strings Part I

One Minute Overview of Multidimensional Scaling

领英推荐

Jose Luis Hidalgo的更多文章

Let's get real, Europe: Atlanticism is over.

Category Theory for Artificial Intelligence

Is quantum computing having its "Theranos" moment?

Geometric Deep Learning (a "5G" that might actually be relevant innovation)

The best of notebooks and the best of Excel, all in one?

Two mathematical topics that should (probably) be included in Data Science standard curriculum

LegalTech and technology culture

GPT-3 can write code, and that might be a great thing for software developers

Why we need "humble AI"

Current status of neural differential equations and the importance of language

社区洞察

其他会员也浏览了

Librarian or Farmer?: How Bayes' Theorem can give Analysts more perspective on solving problems

Non-Linear Parametric Modeling in R using Genetic Algorithm

k-nearest neighbors algorithm

Chapter 4: Decision Trees Algorithms with Math.

Chapter 1.2: Gradient Descent with Math.

Role of Linear Algebra in the Machine Learning?

Lessons in A.I. from a Budding Machine Learning Engineer — Getting to the Core — Part II

Simplifying Graphs and Intro to Iterative Graph Algorithms | Part 1

Topological Data Analysis and Strings Part I

One Minute Overview of Multidimensional Scaling