Big Bass Bonanza 2024.Enjoy Free 888+200 Daily Legal Bonus

Randomness is a pillar of our world economy, because randomness is the key ingredient of cryptography, and because cryptography sustains the security of all online transactions.

This central role of randomness is here to stay: in the July issue of via Nebulosa ("Rough computing"), I explained how the robustness of Dilithium, the leading algorithm chosen by NIST for Post Quantum signatures, is almost entirely dependent on the statistical properties of random samplers.

Randomness is the royal domain of statistics. For centuries now, statistics have provided the foundations and the tools to try to predict or to deal with events based on chance, luck, uncertainty.

If we are tricked by randomness more often than we would admit, the laws of statistics have given us a good understanding of many natural phenomena, and we learnt to turn a few long standing mysteries into friendly allies a long time ago.

Still, many very important aspects of randomness remain stubbornly out of reach today: among these, let's mention some aspects of the percolation theory which I covered in the August issue of my newsletter ("Setting off fusion in Kubernetes!")

Randomness in information theory

Under the light of computers and information theory, the problem becomes: how can we define the randomness of a binary string (or 'sequence')?

The answer took quite a long time to find! It required nothing less than the application of the famous G?del theorem.

From complex to random

One of the most remarkable breakthroughs came in the early sixties, when Kolmogorov, Chaitin and Solomonoff formalized the definition of strings complexity.

Martin-L?f was a student of Kolmogorov himself: he knew a lot about this just-invented theory of complexity. He didn't take long to connect the dots between complexity and randomness, demonstrating that both were, in fact, the same.

How random is random?

It is striking that such a simple concept as a random string was so difficult to define, but let's think it over for a couple of seconds... If I show you a binary sequence, how would you tell whether it's random or not? And how would you rate the "quality" of its randomness?

Take for instance this single digit: '0'? Is it randomly generated?

Now what about this sequence: '0100110000110'?

Interestingly, the length of sequences matters. Intuitively, the longer a sequence, the greater our confidence in the verdict.

Martin-L?f successfully provided a formal definition for the randomness of finite (and infinite) sequences, even when n is small. The solution he discovered is truly amazing.

Let's gain an understanding of how it works, without being too technical.

Testing randomness

To determine whether a sequence is random or not, the first thing we need are tests: for example, counting the ones in a sequence to ensure there are not "too many" of them is the simplest test one can imagine.

Testing one aspect at a time

But relying on a single test is not enough: it only reveals one aspect of randomness. What we want is a universal test of randomness. For such test to make sense, it must capture all possible tests one can imagine right now, or in two centuries.

Believe it or not, this is precisely what Martin-L?f aimed to achieve: not only did he prove the existence of universal tests, but he also offered a method to construct them from scratch.

To understand his construction scheme, we must first take a closer look at how tests work in the world of binary sequences. Consider the set S_n of all binary sequences of length n. This set contains 2^n items.

Now, we can split S_n in two halves, each containing 2^(n-1) items (figure b). If n is large enough, we can repeat the splitting many, many times (figures c and d).

Accepting and rejecting binary sequences

A test of randomness splits S_n in two parts:

rejected sequences, which fail the randomness hypothesis, i.e. non-random sequences
accepted sequences, i.e. sequence which look random for this test

Remember that the quality of randomness depends on "n"? With large n, we can run our tests with a finer grain than with small n. The grain can be any number "m" between 1 and n.

The balance between rejected and accepted sequences depends on "m": if m is small, the resolution is 'coarse', and most sequences will be non-random (i.e rejected). If m approaches n, the resolution is 'fine', only a very small number of sequences will be rejected.

In statistics, m is called the level of significance.

Significance

Recall our test-example where we counted the ones? We can define the level of significance as such:

coarse (m=1): we count the number of '1', which should amount to about 50% for a random sequence
less coarse (m=2): we perform the previous test, plus we count the number of '11', which should amount to about 25%
even less coarse (m=4): we perform the two previous tests, plus we count the number of '1111' which should amount to about 6%

Here is how our test-example will partition S_n depending on 1, 2, m. In the figures below, rejected subsets are shown in green.

Note that, by construction, the smaller subsets of finer resolution are always included into the larger ones of a coarser resolution. This is crucially important.

Also note that, as we increase resolution, it becomes obvious that the vast majority of sequences are random.

If we take another test, the rejection pattern will remain consistent, because non-random sequences don't depend on any given particular test. In the next figures, rejected subsets are shown in yellow.

U, the universal test

I wouldn't say that the universal test is built. I would say it is "knitted", like we knit a patchwork. Roughly speaking, Martin-L?f proceeds in three steps:

Prove that all possible tests can be enumerated.
Consider the rejected subset of any test T at significance level m+c (where c is a constant which depends on T and the to-be-built universal test): regardless of m, this subset is always included in a larger subset at significance level m, because m+c>=m.
Define this larger subset at level m as the union of all test subsets at levels m+c included into m. This is possible since all tests are enumerated: we set c to be the G?del number of each test in the enumeration.

The Universal test is made of all such nested subsets, for all m. If we illustrate each test with a different color, the universal test forms a colorful patchwork:

You see from the picture above that all tests are actually embodied into the universal test, each one contributing to a big or small fragment of the patchwork (depending on its G?del rank in the global enumeration...)

From finite to infinite

Simple is beautiful! Martin-L?f generalized his proof to infinite sequences without pain. Infinite sequences are less useful for us, engineers, but they are of huge importance to grasp future insights on the ever elusive concept of randomness. And this will help prevent more engineering pitfalls, traps, tricks, dead ends, delusion, design errors, and even, yes, security vulnerabilities...

Final words

If you have read that far, then you should give a look at Martin L?f's groundbreaking work:"The definition of random sequences".

I am greatly indebted to the following ENSEEIHT AIn7 - ENSEEIHT Alumni Toulouse INP professors who sparkled my long-lasting interest for mathematics applied to IT architecture and security:

Marcel Gandriau ,Fran?ois Rodriguez, Patrick Sallé and Xavier Cregut

A tribute to Per Martin-L?f

Christophe Parisel

Senior Cloud security architect at Société Générale

Randomness in information theory

Testing randomness

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

The Development of Bletchley Park: The Heart of WWII Cryptanalysis

Kinelly's Fragment - A Secure Signing Method from 1876

AQ Newsflash: February 2024 Newsletter

Quantum-Resistant Cryptography: Securing the Future of Data Protection

QKD is theoretically unbreakable and practically breakable. Don't get confused anymore about what means what.

What's up with random number generators (RNGs)?

How Tree Algorithms Can Accelerate Data Encryption

The journey from the BANCOR to SDR to BitGold .. to Satoshi Project (via US Gov.Agencies)?? PART / 04

John Would Be So Proud To Help Build A New World Based on Logs

A Brief Survey of World Post-Quantum Cryptography Standardization Efforts

Randomness in information theory

Testing randomness

领英推荐

AI curiosity

2024年11月26日

The nested cloud

2024年11月13日

Overcoming the security challenge of Text-To-Action

2024年10月15日

Cloud drift management for Cyber

2024年9月23日

From Art to Craft: A Practical Approach to Setting EPSS Thresholds

2024年9月2日

The security of random number generators (part 1)

2024年8月26日

How Microsoft is modernizing Azure

2024年7月15日

A fresh take on time series forecasting

2024年6月24日

The threat of Azure service tags

2024年6月5日

State of Confidential Containers (opinionated)

2024年4月29日

社区洞察

其他会员也浏览了

The Development of Bletchley Park: The Heart of WWII Cryptanalysis

Kinelly's Fragment - A Secure Signing Method from 1876

AQ Newsflash: February 2024 Newsletter

Quantum-Resistant Cryptography: Securing the Future of Data Protection

QKD is theoretically unbreakable and practically breakable. Don't get confused anymore about what means what.

What's up with random number generators (RNGs)?

How Tree Algorithms Can Accelerate Data Encryption

The journey from the BANCOR to SDR to BitGold .. to Satoshi Project (via US Gov.Agencies)?? PART / 04

John Would Be So Proud To Help Build A New World Based on Logs

A Brief Survey of World Post-Quantum Cryptography Standardization Efforts