Code Optimization

Code Optimization

Computing resources in 2021 cost a lot less than once. Clock cycles and RAM are now quite cheap. However, this is not a good reason to waste them. Let's try to give a small demonstration of how an implementation of a classic algorithm, the "Sieve of Eratosthenes", can be optimized in performance and memory occupation.

To encourage this optimization, we'll put quite strict constraints. The chosen target architecture is the old Commodore 64 Home Computer, the programming language will be BASIC V2.

Eratosthenes of Cyrene lived in the 'cradle of Civilization' that was the Mediterranean sea, between years 276 and 194 B.C.

He was the Third Librarian in the Library of Alexandria (source WikiPedia), and among other things, not only 200 years before the birth of Christ he had no doubt that the earth had a spherical shape. But he had also calculated its diameter, with an error of less than 2%, practically putting a pole on the ground and measuring angles and shadows. And thus laying the foundations of spherical geometry and cartography.

He also devised an iterative algorithm to find prime numbers.

A prime number is a positive integer number greater than one, divisible exactly (with remainder zero) only by one and by itself. Knowing this, a first intuitive algorithm for finding this primes is to take all integers greater than one, one at a time. Taking the number 'i', we divide it by all the integers, from 2 to i-1 (we could stop at the square root of 'i'). If none of these divisions has zero reminder, the number is prime. Otherwise, just a division giving the remainder zero, we understand that the number has a divisor, so it cannot be prime, and we can move on to analyze the next 'i'.

This algorithm has the advantage of requiring very little memory for calculations. The downside is that as 'i' grows, it requires more and more divisions, becoming slower and slower.

Eratosthenes instead applied what today we would call "divergent thinking" to solve the same problem. Instead of looking for primes, let's look for the "non-primes" and take them away. At the end, only prime numbers will remain.?

We start with a sieve containing all integer numbers, from 2 to a fixed upper limit 'maxn'. This sieve is magical and, by shaking it, is capable of dropping only numbers that surely cannot be prime. For example, the multiples of a number. One sifted after the other, at the end only prime numbers would remain inside this magic sieve.?

Said so it seems a very simple and obvious solution. Instead it is not at all.

In pseudo-coding, we could write the algorithm like this:

Initially in the sieve we have all the integers between 2 and 'maxn'

  • Step 1. take the smallest number contained in the sieve. This is definitely prime.
  • Step 2. With a shake at the sieve, we'll remove all prime's multiples from the sieve
  • Step 3. If there are still numbers to be examined in the sieve, we go back to step 1. Otherwise, if we have examined the last number present, end of execution.

Let's take as an example a sieve containing all the numbers from 2 to 20

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Step 1: let's take the smallest number: it's 2. It is prime.

Step 2: let's remove its multiples.

They remain in the sieve

2 3 5 7 9 11 13 15 17 19

Again

Step 1: We take the smallest number. It's 3. It is prime

Step 2: let's remove its multiples

2 3 5 7 11 13 17 19

In this case we observe that, after just 2 passes, only prime numbers remain in the sieve.

The advantage of this algorithm is precisely that of being much faster than the previous one, requiring no complex and therefore slow operations. For example, the divisions of the previous algorithm, necessary to calculate the remainder, are not needed here. The disadvantage of the sieve is that we need a memory cell for each number initially inside the sieve, to store its status during execution (number in or out of the sieve).

Let's see a possible implementation in BASIC V2 (Commodore 64). For those interested in practicing the proposed listings, you can use the excellent VICE emulator (https://vice-emu.sourceforge.io/index.html#download). The listings can be copied and pasted directly to the emulator, and run with the BASIC command 'RUN'. To clear the memory and load a new listing you can use the 'NEW' command, or reset the emulator or the real C64 (if you are lucky enough to have one).

Eratosthenes v0

5 rem init maxn limit and the flag's array

6 rem nums(x) = 0 -> x in the sieve

7 rem nums(x) = 1 -> x out of the sieve

10 maxn=100:dim nums(maxn)

12 rem init timer and the prime numbers counter

15 ti$="000000":np=0

17 rem examine all numbers, from '2' to 'maxn'

20 for i=2 to maxn

25 rem if the number 'i' is out of the sieve, jump to the next one

30 if nums(i) <> 0 then 90

35 rem else 'i' is prime. increment primes counter

36 rem print prime 'i' and prepare to generate multiples

40 np=np+1:print i,:k=1

45 rem calculate multiple as i*k (k=1,2,..)

46 rem until the multiple is <= maxn

47 rem then check next 'i'

50 multiplo=i*k

60 if multiplo>maxn then 90

70 rem put multiple out of the sieve

80 nums(multiplo)=1

82 rem next multiple to be excluded?

85 k=k+1:goto 50

90 next i

95 rem print prime numbers counter and calculation time

100 print:print "contaprimi: ";np

110 print "ti ";ti$

REM comments should help you understand how the code is structured, which follows exactly the algorithm described above.

Let's see how to do a first optimization to reduce execution time, eliminating some unnecessary operations.

First we focus on the odd numbers only. Excluding '2' (which we can simply print at the beginning), all the other prime numbers are certainly odd. We then examine all the odd numbers starting from '3', up to 'maxn', using the 'step 2' option in the FOR loop.

If the examined number 'i' is a prime number, the calculation of its multiples can start from i * i ('k' this time will start from 'i' instead of 1, to calculate the multiples). All multiples of 'i', calculated with k <i, have certainly all already been excluded in the previous steps of the algorithm.

Furthermore, we can exclude all even 'k' from the calculation of multiples, considering only the odd ones.

eratosthenes v0 opt

10 maxn=100:dim nums(maxn):print 2,

15 ti$="000000":np=1

20 for i=3 to maxn step 2

30 if nums(i) <> 0 then 90

40 np=np+1:print i,:k=i

50 multiplo=i*k

60 if multiplo>maxn then 90

80 nums(multiplo)=1

85 k=k+2:goto 50

90 next i

100 print:print "contaprimi: ";np

110 print "ti ";ti$

We removed the REM comments from the code, which is identical to the previous one except for the optimizations described above (in lines 20, 40 and 85). Running it after the changes, we will notice a noticeable performance increase over the previous version.

After eliminating unnecessary calculations, subsequent optimizations can therefore be directed towards memory occupation, expecting a slightly lower execution speed.

The nums() array currently occupies 5 bytes for each cell, just to store a '1' or a '0'. Furthermore, the first 3 cells are unused (nums (0) .. nums (2)). Ditto for even cells, which are never written or read in the latest version of the program. A waste of useless space.

If we used an array of integers nums%() instead, we would have a single cell that occupies 2 bytes instead of 5 (so less than half the space). In fact, the integer numbers on the C64 occupy only 16 bits (2 bytes) of memory. If we then used every single bit within the single integer as a 1/0 flag to store the information "the number examined is inside / outside the sieve?", with each cell of the array we could have 16 flags available (from bit0 to bit15). If we excluded the odd numbers here too, we would double the numbers that can be examined, using the same amount of memory.

If we do the math, in the first version we used 500 bytes to manage 100 numbers (100 * 5). Now, occupying the same space of 500 bytes, we could manage all numbers from 3 up to about 8,000. 80 times more than before.

Having said all this, given 'maxn' the highest number inside the sieve, an array of INT(maxn / 32) +1) boxes will suffice. Each box, as described above, will be used as sixteen 1/0 flags that allow us to tell if an odd number> = 3 is in or out of the sieve. In the first cell of the array, for example, bit0 will represent number 1, bit1 number 3, bit2 number 5. Bit14 number 29 and bit15 number 31.

To select the single bit within the cell, we will use bit masks containing powers of two, and the logical "bitwise" operators AND and OR.

The mask for selecting bit0 will be 1, for bit1 it will be 2, for bit2 it will be 4, for bit 14 it will be 16384 and for bit15 it will be -32768

The sign '-' for the mask of bit15 is used because in BASIC V2 the integer numbers on 16 bits are signed integers, represented in memory using the two's complement, and the number having the most significant bit (bit15) to 1 corresponds precisely to the whole number -32768

0000 0000 0000 0001 -> 1

0000 0000 0000 0010 -> 2

0000 0000 0000 0100 -> 4

0100 0000 0000 0000 -> 16384

1000 0000 0000 0000 -> -32768

If we call the mask 'mb', the logical AND operation ((nums%(0) and mb) <> 0) allows us to understand if the bit of nums%(0) cell selected using mask mb is 1 or 0 (outside or inside the sieve).

This is because (Boolean algebra):

X and 0 = 0

X and 1 = X

The mask resets all the bits in the same positions as its zeros, except the one selected by the only '1' present.

How do we calculate the array index and the position of the bit corresponding to the 'i' number we are examining?

the index will be INT((i / 16) / 2), or INT(i / 32)

nums%(0) will then represent all the odd integers from 1 to 31, nums%(1) those from 33 to 63 and so on.

The position of the bit (0..15) will be identified by calculating INT(i / 2) mod 16, the remainder (modulus) of the division between INT(i / 2) and 16.?

BASIC V2 does not have the 'mod' operation to calculate the division's remainder, but 16 is a power of two (16 -> 2^4 -> 0001 0000), and the remainder of division by 16?corresponds to the 4 least significant bits of the dividend.?

Applying a AND mask 2^4 -1 (0000 0000 0000 1111 -> 15) to the dividend, we obtain the remainder we are looking for.

Once we have found the position of the bit we are interested in, we can generate the 'mb' mask by calculating the corresponding power of 2:

2^position

position -> mask

0?????0000 0000 0000 0001 -> 2^0 -> 1

1?????0000 0000 0000 0010 -> 2^1 -> 2

2?????0000 0000 0000 0100 -> 2^2 -> 4

3?????0000 0000 0000 1000 -> 2^3 -> 8

etc.

We just have to remember to correct the case 2^15 -> 32768, changing its sign, as stated above.

There are only 16 masks, relatively few. We can pre-calculate them once and put in an array bm%(15), so that we don't have to recalculate 'bm' each time using the (slow) exponentiation operator.

With the same formulas we calculate the array index and the bit corresponding to the multiple to be put out of the sieve in the innermost cycle. In this case, to put the corresponding bit to 1 just put the array element in OR with the correct mask.

nums%(index) = nums%(index) OR mb

Here too the Boolean algebra helps us:

X or 0 = X

X or 1 = 1

The mask forces the bit of the first operand selected from its '1's to 1, leaving all the other bits unchanged.

eratost opt v2?

10 maxn=100:dim nums%(int((maxn+3)/32)+1):print 2,

12 dim bm%(15):for i=0 to 14:bm%(i)=2^i:next

13 bm%(15)=-32768

15 ti$="000000":np=1

20 for i=3 to maxn step 2

30 if (nums%(int(i/32)) and bm%(int(i/2) and 15)) <> 0 then 90

40 np=np+1:print i,:k=i

50 multiplo=i*k

60 if multiplo>maxn then 90

70 l=int(multiplo/32):nums%(l)=nums%(l) or bm%(int(multiplo/2) and 15)

80 multiplo=multiplo+k+k:goto 60

90 next

100 print:print "contaprimi: ";np

110 print "ti ";ti$

We have added in this version in lines 12 and 13 the initialization of the array that contains the bit masks. There are also two other small optimizations in line 70. First we calculate the array index once and put it in the variable 'l', thus saving a division. Furthermore, the multiples subsequent to the first are calculated by adding the value 2 * k to the 'multiple', instead of increasing k by 2 and then recalculating multiple = i * k. In the formula we then use 'k + k' instead of '2 * k', because the sum is slightly faster than multiplication.

The last optimization involves eliminating at least the divisions from line 30. To achieve this, we will use two nested loops to iteract on the the array (from 0 to its 'dar' size) and the individual bits (0..15) of its elements. These two loops will replace the outer FOR loop on 'i'. In this way we will no longer have to calculate the array index and the position of the bit by obtaining them from 'i', because we would already have index and position ready in the two variables of the FOR loops, which we will call 'num' and 'b'. The examined number 'i' instead will be calculated starting from the two previous variables, using the formula:

i = (num * 32) + b + b + 1

Also in this case we use 'b + b' instead of '2 * b' to do it a little faster.

eratost v3 final

110 maxn=100:dar=int((maxn+3)/32):dim nums%(dar):nums%(0)=1:print 2,

112 dim bm%(15):for i=0 to 14:bm%(i)=2^i:next

113 bm%(15)=-32768

115 ti$="000000":np=1

120 for num=0 to dar

122 for b=0 to 15

130 if (nums%(num) and bm%(b)) <> 0 then 190

140 np=np+1:i=(num*32)+b+b+1

142 if i>maxn then 200

145 print i,:k=i

150 multiplo=i*k

160 if multiplo>maxn then 190

170 l=int(multiplo/32):nums%(l)=nums%(l) or bm%(int(multiplo/2) and 15)

180 multiplo=multiplo+k+k:goto 160

190 next b:next num

200 print:print "contaprimi: ";np

210 print "ti ";ti$

In this last release we have also renumbered the source code lines. We note that several formulas have been greatly simplified.

We have also added an if () to line 142 to be able to exit immediately as soon as 'i' exceeds the maxn preset value, without waiting for the end of the two FOR loops.

Have a lot of fun.

要查看或添加评论,请登录

Massimo Sanna的更多文章

社区洞察

其他会员也浏览了