How do we measure randomness?

How do we measure randomness?

Encrypted content tends not to a have a magic number (apart from detecting it in a disk partition). If we analyse both compressed and encrypted fragments of files we see high degrees of randomness. An important detection method for detecting compressed and encrypted files is the randomness of the bytes in the file. This measure is known as entropy, and was defined by Claude E. Shannon in his 1948 paper. The maximum entropy occurs when there is an equal distribution of all bytes across the file, and where it is not possible to compress the file anymore, as it is truly random.

If we try some hexadecimal data we can measure how many bits are needed to represent each byte. A value of 8 bits per byte is the maximum compression. If we try some random data of [here]:

FF EE DD CC BB AA 99 88 77 66

The results are then:

File size in bytes: 10

Shannon entropy (min bits per byte-character): 3.32192809489

Min possible file size: 33.2192809489  bits
Min possible file size: 4.15241011861  bytes

Efficency: 41.5241% (where 100% is the maximum)

Frequencies of each byte-character:
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
0.0, 0.0, 0.0, 0.1]
 
  

We determine frequencies of occurrences and then use the following to determine the entropy (where a value of 8 bits is completely random data):

For example "00 01 02 03" gives f1=0.25, f2=0.25, f3=0.25 and f4=0.25, which gives:

The Python code for this is:

 for freq in freqList
    ent = ent + freq * math.log(freq, 2)

The following are some examples:

  • 00 FF 00 FF 00 FF 00 FF 00 FF 00. Try
  • 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F. Try
  • 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F. Try
  • First 256 bytes of TrueCrypt volume. Try
  • First 256 bytes of PKZip file (notice major number: 50 4b 03 04). Try

Encrypting files and file systems

If we measure the Shannon entropy of a TrueCrypt volume we get the results of:

C:\Python27>python en.py "c:\1.tc"
File size in bytes:
3145728

Shannon entropy (min bits per byte-character):
7.99994457357

Min possible file size assuming max theoretical compression efficien-cy:
25165649.6435 in bits
3145706.20544 in bytes

We can see that the file size is 3,145,728 bytes and the minimum bytes for each character is 7.99994457357, which is extremely close to an almost perfect rating of 8 bits per byte. The efficiency is thus 99.999307 (3145706.20544/3145728 x 100%).

If we now try a compressed file (DOCX, which derives from the PKZip file format), we get:

File size in bytes:
318724

Shannon entropy (min bits per byte-character):
7.98787618412

Min possible file size assuming max theoretical compression efficien-cy:
2545927.84891 in bits
318240.981113 in bytes

And we now get an efficiency of 99.84% with an entropy of 7.98787618412. A measure of entropy on a DOC file (a non-compressed or encrypted file format) gives:

File size in bytes:
62464

Shannon entropy (min bits per byte-character):
4.64286159485

Min possible file size assuming max theoretical compression efficien-cy:
290011.706661 in bits
36251.4633326 in bytes

Which gives an efficiency of 58.03577%. Thus a typical characteristic is that encrypted content results in the highest levels of Shannon entropy followed closely by compressed file formats. An entropy value of over 98% is likely to identify compressed or encrypted content.

Conclusions

In our world, we increasingly use encryption to protect data. So how can we tell we are dealing with encrypted content? Well, we measure its entropy.


I propose to use Entropy for detect ransomware activity https://sites.google.com/site/cryptocloudscom/ransomware-and-entropy-detector

赞
回复
?? Morgan Duffy ??

I help organisations to modernise, innovate ?? & improve ?? ?? ?? ?? their Finance, HR, ESG, Customer Experience & supply chain functions.

7 å¹´

Take randomness down the pub and slot a few drinks into it...then you'll see.

赞
回复

要查看或添加评论,请登录

Prof Bill Buchanan OBE FRSE的更多文章

社区洞察

其他会员也浏览了