A tribute to Per Martin-L?f
P. Martin L?f. Made with Procreate, openArt.ai and DALL-E 3

A tribute to Per Martin-L?f

Randomness is a pillar of our world economy, because randomness is the key ingredient of cryptography, and because cryptography sustains the security of all online transactions.

This central role of randomness is here to stay: in the July issue of via Nebulosa ("Rough computing"), I explained how the robustness of Dilithium, the leading algorithm chosen by NIST for Post Quantum signatures, is almost entirely dependent on the statistical properties of random samplers.

Randomness is the royal domain of statistics. For centuries now, statistics have provided the foundations and the tools to try to predict or to deal with events based on chance, luck, uncertainty.

If we are tricked by randomness more often than we would admit, the laws of statistics have given us a good understanding of many natural phenomena, and we learnt to turn a few long standing mysteries into friendly allies a long time ago.

Made with DALL-E


Still, many very important aspects of randomness remain stubbornly out of reach today: among these, let's mention some aspects of the percolation theory which I covered in the August issue of my newsletter ("Setting off fusion in Kubernetes!")

Randomness in information theory

Under the light of computers and information theory, the problem becomes: how can we define the randomness of a binary string (or 'sequence')?

The answer took quite a long time to find! It required nothing less than the application of the famous G?del theorem.


From complex to random

One of the most remarkable breakthroughs came in the early sixties, when Kolmogorov, Chaitin and Solomonoff formalized the definition of strings complexity.

Martin-L?f was a student of Kolmogorov himself: he knew a lot about this just-invented theory of complexity. He didn't take long to connect the dots between complexity and randomness, demonstrating that both were, in fact, the same.


How random is random?

It is striking that such a simple concept as a random string was so difficult to define, but let's think it over for a couple of seconds... If I show you a binary sequence, how would you tell whether it's random or not? And how would you rate the "quality" of its randomness?

Take for instance this single digit: '0'? Is it randomly generated?

Now what about this sequence: '0100110000110'?

Interestingly, the length of sequences matters. Intuitively, the longer a sequence, the greater our confidence in the verdict.

Martin-L?f successfully provided a formal definition for the randomness of finite (and infinite) sequences, even when n is small. The solution he discovered is truly amazing.

Let's gain an understanding of how it works, without being too technical.


Testing randomness

To determine whether a sequence is random or not, the first thing we need are tests: for example, counting the ones in a sequence to ensure there are not "too many" of them is the simplest test one can imagine.


Testing one aspect at a time

But relying on a single test is not enough: it only reveals one aspect of randomness. What we want is a universal test of randomness. For such test to make sense, it must capture all possible tests one can imagine right now, or in two centuries.

Believe it or not, this is precisely what Martin-L?f aimed to achieve: not only did he prove the existence of universal tests, but he also offered a method to construct them from scratch.

To understand his construction scheme, we must first take a closer look at how tests work in the world of binary sequences. Consider the set S_n of all binary sequences of length n. This set contains 2^n items.

Figure a: The set of all binary sequences of length n


Now, we can split S_n in two halves, each containing 2^(n-1) items (figure b). If n is large enough, we can repeat the splitting many, many times (figures c and d).

Splitting the set in halves (fig b), quarters (fig b), and more (fig c)



Accepting and rejecting binary sequences

A test of randomness splits S_n in two parts:

  • rejected sequences, which fail the randomness hypothesis, i.e. non-random sequences
  • accepted sequences, i.e. sequence which look random for this test

Remember that the quality of randomness depends on "n"? With large n, we can run our tests with a finer grain than with small n. The grain can be any number "m" between 1 and n.

The balance between rejected and accepted sequences depends on "m": if m is small, the resolution is 'coarse', and most sequences will be non-random (i.e rejected). If m approaches n, the resolution is 'fine', only a very small number of sequences will be rejected.

In statistics, m is called the level of significance.


Significance

Recall our test-example where we counted the ones? We can define the level of significance as such:

  • coarse (m=1): we count the number of '1', which should amount to about 50% for a random sequence
  • less coarse (m=2): we perform the previous test, plus we count the number of '11', which should amount to about 25%
  • even less coarse (m=4): we perform the two previous tests, plus we count the number of '1111' which should amount to about 6%


Here is how our test-example will partition S_n depending on 1, 2, m. In the figures below, rejected subsets are shown in green.

Figures d, e, f: rejecting randomness at ever finer grains


Note that, by construction, the smaller subsets of finer resolution are always included into the larger ones of a coarser resolution. This is crucially important.

Also note that, as we increase resolution, it becomes obvious that the vast majority of sequences are random.

If we take another test, the rejection pattern will remain consistent, because non-random sequences don't depend on any given particular test. In the next figures, rejected subsets are shown in yellow.

Figures g, h, i: different test, same partitioning


U, the universal test

I wouldn't say that the universal test is built. I would say it is "knitted", like we knit a patchwork. Roughly speaking, Martin-L?f proceeds in three steps:

  1. Prove that all possible tests can be enumerated.
  2. Consider the rejected subset of any test T at significance level m+c (where c is a constant which depends on T and the to-be-built universal test): regardless of m, this subset is always included in a larger subset at significance level m, because m+c>=m.
  3. Define this larger subset at level m as the union of all test subsets at levels m+c included into m. This is possible since all tests are enumerated: we set c to be the G?del number of each test in the enumeration.

The Universal test is made of all such nested subsets, for all m. If we illustrate each test with a different color, the universal test forms a colorful patchwork:

A Universal test of randomness

You see from the picture above that all tests are actually embodied into the universal test, each one contributing to a big or small fragment of the patchwork (depending on its G?del rank in the global enumeration...)


From finite to infinite

Simple is beautiful! Martin-L?f generalized his proof to infinite sequences without pain. Infinite sequences are less useful for us, engineers, but they are of huge importance to grasp future insights on the ever elusive concept of randomness. And this will help prevent more engineering pitfalls, traps, tricks, dead ends, delusion, design errors, and even, yes, security vulnerabilities...

DALL-E's view of a random numbers for infinite sequences


Final words

If you have read that far, then you should give a look at Martin L?f's groundbreaking work:"The definition of random sequences".


I am greatly indebted to the following ENSEEIHT AIn7 - ENSEEIHT Alumni Toulouse INP professors who sparkled my long-lasting interest for mathematics applied to IT architecture and security:

Marcel Gandriau ,Fran?ois Rodriguez, Patrick Sallé and Xavier Cregut


要查看或添加评论,请登录

社区洞察

其他会员也浏览了