A tribute to Per Martin-L?f
Randomness is a pillar of our world economy, because randomness is the key ingredient of cryptography, and because cryptography sustains the security of all online transactions.
This central role of randomness is here to stay: in the July issue of via Nebulosa ("Rough computing"), I explained how the robustness of Dilithium, the leading algorithm chosen by NIST for Post Quantum signatures, is almost entirely dependent on the statistical properties of random samplers.
Randomness is the royal domain of statistics. For centuries now, statistics have provided the foundations and the tools to try to predict or to deal with events based on chance, luck, uncertainty.
If we are tricked by randomness more often than we would admit, the laws of statistics have given us a good understanding of many natural phenomena, and we learnt to turn a few long standing mysteries into friendly allies a long time ago.
Still, many very important aspects of randomness remain stubbornly out of reach today: among these, let's mention some aspects of the percolation theory which I covered in the August issue of my newsletter ("Setting off fusion in Kubernetes!")
Randomness in information theory
Under the light of computers and information theory, the problem becomes: how can we define the randomness of a binary string (or 'sequence')?
The answer took quite a long time to find! It required nothing less than the application of the famous G?del theorem.
From complex to random
One of the most remarkable breakthroughs came in the early sixties, when Kolmogorov, Chaitin and Solomonoff formalized the definition of strings complexity.
Martin-L?f was a student of Kolmogorov himself: he knew a lot about this just-invented theory of complexity. He didn't take long to connect the dots between complexity and randomness, demonstrating that both were, in fact, the same.
How random is random?
It is striking that such a simple concept as a random string was so difficult to define, but let's think it over for a couple of seconds... If I show you a binary sequence, how would you tell whether it's random or not? And how would you rate the "quality" of its randomness?
Take for instance this single digit: '0'? Is it randomly generated?
Now what about this sequence: '0100110000110'?
Interestingly, the length of sequences matters. Intuitively, the longer a sequence, the greater our confidence in the verdict.
Martin-L?f successfully provided a formal definition for the randomness of finite (and infinite) sequences, even when n is small. The solution he discovered is truly amazing.
Let's gain an understanding of how it works, without being too technical.
Testing randomness
To determine whether a sequence is random or not, the first thing we need are tests: for example, counting the ones in a sequence to ensure there are not "too many" of them is the simplest test one can imagine.
Testing one aspect at a time
But relying on a single test is not enough: it only reveals one aspect of randomness. What we want is a universal test of randomness. For such test to make sense, it must capture all possible tests one can imagine right now, or in two centuries.
Believe it or not, this is precisely what Martin-L?f aimed to achieve: not only did he prove the existence of universal tests, but he also offered a method to construct them from scratch.
To understand his construction scheme, we must first take a closer look at how tests work in the world of binary sequences. Consider the set S_n of all binary sequences of length n. This set contains 2^n items.
Now, we can split S_n in two halves, each containing 2^(n-1) items (figure b). If n is large enough, we can repeat the splitting many, many times (figures c and d).
Accepting and rejecting binary sequences
A test of randomness splits S_n in two parts:
Remember that the quality of randomness depends on "n"? With large n, we can run our tests with a finer grain than with small n. The grain can be any number "m" between 1 and n.
The balance between rejected and accepted sequences depends on "m": if m is small, the resolution is 'coarse', and most sequences will be non-random (i.e rejected). If m approaches n, the resolution is 'fine', only a very small number of sequences will be rejected.
In statistics, m is called the level of significance.
Significance
Recall our test-example where we counted the ones? We can define the level of significance as such:
Here is how our test-example will partition S_n depending on 1, 2, m. In the figures below, rejected subsets are shown in green.
Note that, by construction, the smaller subsets of finer resolution are always included into the larger ones of a coarser resolution. This is crucially important.
Also note that, as we increase resolution, it becomes obvious that the vast majority of sequences are random.
If we take another test, the rejection pattern will remain consistent, because non-random sequences don't depend on any given particular test. In the next figures, rejected subsets are shown in yellow.
U, the universal test
I wouldn't say that the universal test is built. I would say it is "knitted", like we knit a patchwork. Roughly speaking, Martin-L?f proceeds in three steps:
The Universal test is made of all such nested subsets, for all m. If we illustrate each test with a different color, the universal test forms a colorful patchwork:
You see from the picture above that all tests are actually embodied into the universal test, each one contributing to a big or small fragment of the patchwork (depending on its G?del rank in the global enumeration...)
From finite to infinite
Simple is beautiful! Martin-L?f generalized his proof to infinite sequences without pain. Infinite sequences are less useful for us, engineers, but they are of huge importance to grasp future insights on the ever elusive concept of randomness. And this will help prevent more engineering pitfalls, traps, tricks, dead ends, delusion, design errors, and even, yes, security vulnerabilities...
Final words
If you have read that far, then you should give a look at Martin L?f's groundbreaking work:"The definition of random sequences".
I am greatly indebted to the following ENSEEIHT AIn7 - ENSEEIHT Alumni Toulouse INP professors who sparkled my long-lasting interest for mathematics applied to IT architecture and security:
Marcel Gandriau ,Fran?ois Rodriguez, Patrick Sallé and Xavier Cregut