BigNum Arithmetic - Quantization of LLMs, Part-2
We saw in part-1, that most central processing units (CPUs) utilize the 2’s complement to represent integers. In this representation, the first bit signifies the sign of the number, while the remaining bits represent the absolute value of the number if it is positive, or its complement if it is negative. The 2’s complement method also provides a distinct representation for the number zero.
However, it is important to consider that computers utilize a fixed number of bits to represent numbers.
This raises the question of how Python can effectively handle such enormous numbers without encountering any issues.
For instance, when executing the calculation of 2 to the power of 9999 in Python, the resulting value surpasses the capacity of a 64-bit number. Hence, it is intriguing to understand how Python manages to handle these immense numbers seamlessly.
Python utilizes BigNum arithmetic, as demonstrated the table above, where the number 5 only requires one digit when represented in base 10, but three digits when represented in base 2. This rule dictates that the smaller the number, the more digits are needed for representation.
However, Python operates inversely by storing numbers as an array of digits, with each digit representing the number in base 2^30. Consequently, fewer digits are needed to store very large numbers.
Note: CPython, the Python interpreter, provides this functionality, which is not inherently present in the CPU or GPU. Consequently, when aiming for speedy operations through CUDA's hardware acceleration, we must adhere to a fixed format for numbers, typically 32 bits
Let us understand the mechanism of BigNum Arithmetic in more detail.
BigNum Arithmetic
Python provides a way to represent integers using a struct. The following example demonstrates this representation of integers:
PyObject_VAR_HEAD is a macro that expands into a PyVarObject and has following structure:
This observation suggests that an integer, similar to a tuple or a list, can vary in length, providing us with our initial understanding of its potential to accommodate extremely large integers. The _longobject after macro expansion could be roughly seen as:
After additional macros expansions, the struct appears in a simplified form as follows:
What is ob_refcnt and ob_type?
The element ob_refcnt used in Python’s garbage collector and the ob_type is used for type identification. In this case, the object is an integer. These two variables are not particularly significant when discussing integer representation in Python.
The representation of the integer value in Python relies on two other variables which are the ob_digit and ob_size.
What is ob_digit ?
The array ob_digit is of type digit, which is typedef'ed from uint32_t, and is statically allocated to a length of 1. As an array, ob_digit is essentially a 'digit *', a pointer to digit, and therefore can be dynamically allocated to any length if necessary. This feature enables Python to effectively represent and manipulate extremely large integers.
Typically, In lower-level programming languages such as C, the precision of integers is restricted to 64-bit, whereas Python utilizes Arbitrary-precision integers. As of Python 3, all integers are stored as a BigNum and are constrained solely by the memory capacity of the hosting system.
What is ob_size ?
The variable ob_size contains the number of elements in ob_digit. In order to optimize memory allocation for the array ob_digit, Python allocates more memory than necessary and uses the value of ob_size to accurately determine the total number of elements stored in the array.
Summary
Why doe we use base 2^30?
Due to memory limitations, Python does not allow us to effectively utilize all 32 bits for representing an integer. Instead, we are limited to a maximum of 30 bits for representation. To overcome this constraint, we convert numbers to base 2^30, where all digits in the array have values ranging from 0 to 2^30-1 which is 1073741823. The ob_digit variable in the structure is responsible for such arrays of digits.
So, Python converts the number from base 10 to base 2^30 and calls each of it elements as digit which ranges from [0, 1073741823].
To eliminate unnecessary computation CPython has a "fast path" implementation for integers in a range of [-2^30, +2^30]. Such integers are stored as an array of one element and if it's possible treated as fixed 32-bit integers.
Furthermore, the integer representation system utilized in Python stores the array in a little-endian order, where the least-significant value is positioned first (at a lower index). Please refer to example below for more clarification on little-endian order.
Note that, unlike classical approach, the sign of an integer is stored separately in ob_size variable which also stores the size of the ob_digit array. So if you want to change the sign of an array of size 2, you need to set ob_size to -2.
This technique of representing integer values using a sequence of digits through strings or arrays is called BigNum Arithmetic.
Let us see an example to get a better sense the internal mechanism of BigNum Arithmetic as implement by CPython internally.
Example-1 (large positive number)
How to represent the number 1152921504606846976 in Python ?
Example-1 (large negative number)
How to represent the number –1152921504606846976 in Python ?
Example 3: How to represent super large number like 2^9999 in Python ?
We can represent this super large number as follows:
1. Floating-Point Number Representation in CPU (or GPU)
A floating-point number (or real number) can represent a very large value (e.g., 1.32×10^99) or a very small value (e.g., 1.32×10^-99). It could also represent very large negative number (e.g., -1.32×10^99) and very small negative number (e.g., -1.23×10^-99), as well as zero, as illustrated:
The floating point representation of a binary number is similar to scientific notation for decimals. Much like you can represent 23.356 as:
领英推荐
you can represent binary number 10111.001 as:
A floating-point number is typically expressed in the scientific notation, with a fraction (F) and an exponent (E) of a certain radix (r) , in the form of F * r^E.
Representation of floating point number is not unique. For example, the number 55.77 can be represented as 5.577×10^1, 0.5577×10^2, 0.05577×10^3, and so on.
The fractional part can be normalized. In the normalized form, there is only a single non-zero digit before the radix point.
For example,
It should be emphasized that floating-point numbers experience a decrease in precision when they are encoded with a set number of bits (such as 32-bit or 64-bit). This is due to the fact that there is an infinite amount of real numbers (even within a narrow range, like 0.0 to 0.1).
On the other hand, a n-bit binary pattern can represent a finite 2^n distinct numbers. Hence, not all the real numbers can be represented. The nearest approximation will be used instead, resulted in loss of accuracy.
Formal Definition of Floating-Point Number
Aside from the special case of zero and subnormal numbers, the 'significand' (a.k.a fraction/mantissa) is always in normalized form:
Whenever we store a normalized floating point number, the 1 is assumed. We don’t store the entire significand, just the fractional part. This is called the “hidden bit representation”, which gives one additional bit of precision.
32-bit floating-point number in IEEE 754
In 32-bit single-precision floating-point representation:
Normalized Form
Let's illustrate with an example.
Example-1 Suppose that the 32-bit pattern is:
1 10000001 01100000000000000000000
We can see that:
S = 1
E = 10000001
F = 01100000000000000000000
In the normalized form, the actual fraction is normalized with an implicit leading 1 in the form of 1.F
In this example, the actual fraction is 1.01100000000000000000000.
We can also visualize the computation and use the compact formula as follows:
De-Normalized Form
The normalized form presents a significant issue due to the implicit leading 1 for the fraction, rendering it incapable of representing the number zero. Verify this for yourself!
The denormalized form was created to depict both zero and various other numbers. When E=0, the numbers are in a de-normalized form. The fraction uses an implicit leading 0 instead of 1, and the actual exponent remains at -126. As a result, the number zero is indicated by E=0 and F=0.
We can also represent very small positive and negative numbers in de-normalized form with E=0.
Example-2 Suppose S=1, E=0, and F = 011 0000 0000 0000 0000 0000. Compute the number represented.
Summary
In summary, the value (N) is calculated as follows:
Corner Cases in IEEE-754
1. Zero
In the explanation provided earlier regarding floating point numbers, it was mentioned that a leading 1 is always assumed. This holds true for the majority of floating point numbers, with the exception of zero. When storing zero as a floating point number, all zeros are stored for both the exponent and the fractional part. It is important to note that there can be both +0 and -0 depending on the sign bit.
2. Infinity
If a floating point calculation yields a value that exceeds the range of possible numbers in floating point, it is classified as infinity. Infinity is represented by setting all ones in the exponent and all zeros in the fractional part. +∞ and ?∞ are distinguished by the sign bit.
3. NaN
Floating point representation is used to represent arithmetic operations that yield a result that is not a number. In this representation, the exponent is filled with all ones and the fractional part is non-zero.
To be continued.. in Part-3, where we will explore the actual quantization mechanism and different type of quantization techniques used.
Senior Data Scientist
6 个月Knowledgeable