登录查看更多内容

Number System - Quantization of LLMs, Part-1

Akash K.

发布日期: 2024年5月14日

Large Language Models (LLMs) have significantly advanced in recent years, becoming increasingly user-friendly and versatile in various applications. Nevertheless, as the intelligence and complexity of LLMs have expanded, so too has the number of parameters, including weights and activations, which determine their ability to learn from and analyze data.

Hence, the larger an LLM, the more memory it requires.

As the size of an LLM increases, so does the memory it demands. This necessitates running LLMs on high-spec hardware with the required number of GPUs, limiting deployment options and the ease of adopting LLM-based solutions. Fortunately, machine learning researchers are working on a range of solutions to tackle the challenge of growing model sizes, with quantization being a prominent one.

Why do we need Quantization?

Before we deep delve into the concept of Quantization. Let us first try to understand why do we need it in the first place.

Quantization aims to address the following challenges:

Challenge-1: Most contemporary deep neural networks consist of millions or even billions of parameters, which poses a significant challenge.

Consider the following examples:

Ex 1. The smallest LLaMA-2 model consists of 7 billion parameters. Assuming each parameter is 32 bits, we would require 28GB of storage space just to store these parameters on the disk

Ex 2. The smallest LLaMA-3 model consists of 8 billion parameters. Assuming each parameter is 32 bits, we would require 32GB of storage space just to store these parameters on the disk.

Ex 3. The current state-of-the-art GPT-4 has in excess of 1 trillion parameters. Rumor's claim that it has 1.76 trillion parameters. Assuming each parameter is 32 bits, we would require 7.04TB of storage space just to store these parameters on the disk.

Challenge-2: Consequently, larger models pose a challenge as they cannot be effortlessly loaded on a standard PC or a smart phone. When utilizing a CPU for inference, it is necessary to load it into the RAM. Conversely, when using a GPU, it should be loaded into the GPU's memory.

Challenge-3: Similar to humans, computers have a slower processing speed when it comes to performing floating-point operations in comparison to integer operations. Consider the calculation of 4 × 8 and compare it to 1.17 × 2.389. Which one can be computed more quickly?

Answer - 4 x 8

How to tackle these challenges?

To address these challenges, quantization provides the solution. Quantizing large language models (LLMs) is a crucial method for reducing their size and memory usage, all while preserving their quality.

So what exactly is Quantization?

Quantization

Quantization, in an abstract sense, is the process of constraining an input from a continuous or otherwise large set of values to a discrete set.

Mapping of Continuous Signals to Discrete Signals

Palletization (loose form of image compression)

But, how does this relate to LLMs?

To see the relation between abstract quantization mechanism and LLMs. Let us first try to understand the following fundamental concepts of Number System.

Numeric Data Types

Let's examine the representation of numbers in hardware at either the CPU or GPU level. Computers utilize a set number of bits to represent various types of data, such as numbers, characters, or pixel colors. The fixed number of bits is consistently employed.

How is Numeric Data Represented in Modern Computing Systems?

Human beings utilize the decimal (base 10) and duodecimal (base 12) number systems to perform counting and measurements, likely due to our possession of 10 fingers and two prominent toes.

Conversely, computers rely on the binary (base 2) number system, as they consist of binary digital components, known as transistors, which function in two distinct states - on and off. If the current passes through the transistor then the computer reads “1” and if the current is absent from the transistor then it read “0”.

Decimal Number system (Base 10)

Decimal number system has ten symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, called digits. It uses positional notation. That is, the least-significant digit (right-most digit) is of the order of 10^0 (units or ones), the second right-most digit is of the order of 10^1 (tens), the third right-most digit is of the order of 10^2 (hundreds), and so on, where ^ denotes exponent.

For example,

Binary Number System (Base 2)

The Binary Number System is a numerical system that utilizes two symbols, "0" and "1", to represent different numbers. It uses positional notation. The term "binary" is derived from the word "bi," meaning two. As a result, this numerical system is referred to as the Binary Number System. A binary digit is called a bit.

What is the decimal equivalent of binary number 10110?

There are generally various types of number systems and among them the four major ones are,

Binary Number System (Number system with Base 2)
Octal Number System (Number system with Base 8)
Decimal Number System (Number system with Base 10)
Hexadecimal Number System (Number system with Base 16)

As discussed, Computers utilize a set quantity of bits to symbolize various types of data, such as numbers, characters, or pixel colors. A bit sequence consisting of n bits (also called n-bit string or n-bit storage location) has the capability to represent a maximum of 2^n unique entities.

For example, a 3-bit memory location can hold one of these eight binary patterns: 000, 001, 010, 011, 100, 101, 110, or 111.

Hence, it can represent at most 8 distinct entities.

You could use them to represent numbers 0 to 7, numbers 8881 to 8888, characters 'A' to 'H', or up to 8 kinds of fruits like apple, orange, banana; or up to 8 kinds of animals like lion, tiger, etc.

Typically, numbers are represented in groups of 8 bits (byte), 16 bits (short), 32 bits (int), or 64 bits (long).

1. Integer representation in CPU (or GPU)

Integers are whole numbers or fixed-point numbers with the radix point fixed after the least-significant bit. Computers use a fixed number of bits to represent an integer. The commonly-used bit-lengths for integers are 8-bit, 16-bit, 32-bit or 64-bit.

In addition to bit-lengths, there exist two distinct representation schemes for integers.

Unsigned Integers: can represent zero and positive integers.
Signed Integers: can represent zero, positive and negative integers.

Three representation schemes had been proposed for signed integers:

Sign-Magnitude representation
1's Complement representation
2's Complement representation

As a programmer, it is your responsibility to determine the bit-length and representation scheme for the integers based on the specific requirements of your application. In the case of needing a counter to track a small quantity ranging from 0 to 200, you could opt for the 8-bit unsigned integer scheme since it does not involve negative numbers.

Let us try to understand these representations in detail.

1.1 Unsigned Integer (n-bit)

Unsigned integers have the ability to represent zero and positive integers, excluding negative integers. The interpretation of an unsigned integer's value is based on "the magnitude of its underlying binary pattern".

Example 1: Suppose that n=8 and the binary pattern is 01000001, the value of this unsigned integer is 65.

Binary to Decimal Conversion for Unsigned Representation

Example 2: Suppose that n=16 and the binary pattern is 0000000000000000, the value of this unsigned integer is 0.

1.2 Signed Integers

Signed integers can represent zero, positive integers, as well as negative integers. Three representation schemes are available for signed integers:

Sign-Magnitude representation
1's Complement representation
2's Complement representation

In each of the aforementioned three schemes, the sign bit, also known as the most-significant bit (msb), is utilized to indicate the sign of the integer. A value of 0 represents a positive integer, while a value of 1 represents a negative integer. Nevertheless, the interpretation of the integer's magnitude varies across the different schemes.

领英推荐

GenAI Core Topics Explained in Simple Pictures

Vincent Granville 12 个月前

What Is Computer Vision? Explanation, Types + Examples

Neil Sahota 2 年前

Harnessing Sight: Understanding Computer Vision in CNI…

Commtel Networks 1 年前

1. Sign-Magnitude Representation

In sign-magnitude representation:

The sign bit, denoted as the most-significant bit (msb), has a value of 0 for positive integers and 1 for negative integers
The magnitude (absolute value) of the integer is represented by the remaining n-1 bits. This value is viewed as "the magnitude of the (n-1)-bit binary pattern"

Example 1: Suppose that n=8 and the binary representation is 01000001.?

Sign bit is 0 ? positive

Absolute value of remaining (7-bits) is 1000001 = 65.???Hence, the integer is +65.

Binary to Decimal Conversion for Sign-Magnitude Representation

Example 2: Suppose that n=8 and the binary representation is 00000000.

Sign bit is 0 ? positive.

Absolute value is 0000000 = 0???Hence, the integer is +0. Note the + sign here.

Example 3: Suppose that n=8 and the binary representation is 10000000.

Sign bit is 1 ? negative??

Absolute value is 0000000 = 0.??Hence, the integer is -0. Note the - sign here.

So from example-2 and example-3 we can infer that in sign magnitude representation, binary numbers 00000000 and 10000000 have same value.

Two Binary Representations of 0 in Sign-Magnitude representation

Range of signed-magnitude n-bit integers:

Range of signed-magnitude representation for n-bit

The drawbacks of sign-magnitude representation are:

There are two ways to represent the number zero: '00000000' and '1000 0000'. This can potentially cause inefficiency and confusion.
Positive and negative integers need to be processed separately

2. 1's Compliment Representation

In 1's complement representation:

The sign bit, denoted as the most-significant bit (msb), has a value of 0 for positive integers and 1 for negative integers
The remaining n-1 bits represents the magnitude of the integer, as follows:

for positive integers, the absolute value of the integer is equal to "the magnitude of the (n-1)-bit binary pattern"
for negative integers, the absolute value of the integer is equal to "the magnitude of the complement (inverse) of the (n-1)-bit binary pattern" (hence called 1's complement)

Example 1: Suppose that n=8 and the binary representation 01000001.

Sign bit is 0 ? positive
Absolute value is 0100001 = 65???Hence, the integer is +65.

Binary to Decimal Conversion for 1's-Compliment Representation

Example 2: Suppose that n=8 and the binary representation 10000001.

Sign bit is 1 ? negative
Absolute value is the complement of 0000001, i.e., 1111110 = 126.?Hence, the integer is -126

Binary to Decimal Conversion for 1's-Compliment Representation (for -ve sign bit)

Example 3: Suppose that n=8 and the binary representation 0 000 0000.

Sign bit is 0 ? positive???
Absolute value of 0000000 = 0. Hence, the integer is +0

Example 4: Suppose that n=8 and the binary representation 1 111 1111.

Sign bit is 1 ? negative
Absolute value is the complement of 1111111, i.e., 0000000 = 0.?Hence, the integer is -0

The following figure illustrates the visual working of Example-3 and Example-4.

Two Binary Representations of 0 in 1's Compliment Representation

Range of 1's compliment representation for n-bit integer:

Let us visualize the range of 1's compliment representation for n=8.

Once more, the disadvantages are:

There are two ways to represent the number zero: '00000000' and '1000 0000'. This can potentially cause inefficiency and confusion.
Positive and negative integers need to be processed separately

3. 2's Compliment Representation

In 2's complement representation:

The sign bit, denoted as the most-significant bit (msb), has a value of 0 for positive integers and 1 for negative integers
The remaining n-1 bits represents the magnitude of the integer, as follows:

for positive integers, the absolute value of the integer is equal to "the magnitude of the (n-1)-bit binary pattern"
for negative integers, the absolute value of integer can be determined by finding the magnitude of the complement of the (n-1)-bit binary pattern plus one, which is commonly referred to as the 2's complement.

Example 1: Suppose that n=8 and the binary representation 01000001.

Sign bit is 0 ? positive
Absolute value is 0100001 = 65???Hence, the integer is +65.

Binary to Decimal Conversion for 2's-Compliment Representation (for +ve sign bit)

Example 2: Suppose that n=8 and the binary representation 1 000 0001.

Sign bit is 1 ? negative
Absolute value of the complement of 000 0001 plus 1, i.e., (binary addition: 111 1110 + 1) is 127.???Hence, the integer is -127

Example 3: Suppose that n=8 and the binary representation 00000000.

Sign bit is 0 ? positive
Absolute value of 0000000 is 0. ?Hence, the integer is +0

Binary to Decimal Conversion for 2's-Compliment Representation (for +0)

Example 4: Suppose that n=8 and the binary representation 11111111.

Sign bit is 1 ? negative
Absolute value is the 2's complement of 1111111 plus 1, i.e., (binary addition of 0000000 + 1) is 1. Hence, the integer is -1

Binary to Decimal Conversion for 2's-Compliment Representation

Let us visualize the range of 2's compliment representation for n=8.

Range of 2's compliment representation for n-bit integer:

Table for Range of 2's complimentary Representation (n-bit, where n=8, 16, 32, 64)

Modern Computing System use 2's Complement Representation for Signed Integers

Computers use 2's complement in representing signed integers. This is because:

In 2's complement, there is a single representation for the number zero, as opposed to the two representations found in sign-magnitude and 1's complement.
Positive and negative integers can be combined in addition and subtraction operations. Subtraction can be performed by applying the logic of addition.

To be continued... in Part-2

要查看或添加评论，请登录

Akash K.的更多文章

BERT for Topic Modeling - Bidirectional Encoders Representation of Transformers - Part 5

2024年10月2日

BERT for Topic Modeling - Bidirectional Encoders Representation of Transformers - Part 5

In this article we will delve into the implementation of fine-tuning BERT by performing Topic Modeling using PyTorch…

4 条评论
FineTuning BERT- Named Entity Recognition - Bidirectional Encoders Representation of Transformers - Part 4

2024年9月27日

FineTuning BERT- Named Entity Recognition - Bidirectional Encoders Representation of Transformers - Part 4

In this article we will delve into the implementation of fine-tuning BERT with PyTorch through code. Before you move…

16 条评论
Building BERT From Scratch - Bidirectional Encoders Representation of Transformers - Part 3

2024年7月19日

Building BERT From Scratch - Bidirectional Encoders Representation of Transformers - Part 3

In this article we will delve into the implementation of BERT with PyTorch through code. Before you move ahead it is…

3 条评论
Fundamentals of RAG - Retrieval Augmented Generation - Part 1

2024年6月16日

Fundamentals of RAG - Retrieval Augmented Generation - Part 1

Retrieval Augmented Generation (RAG) is an innovative approach that combines the power of retrieval-based models and…

3 条评论
Fundamentals of BERT- Bidirectional Encoders Representations from Transformers, Part-2

2024年6月12日

Fundamentals of BERT- Bidirectional Encoders Representations from Transformers, Part-2

In this article we will explore the concepts of Fine-Tuning. Before you move ahead it is advisable to read Fundamentals…

2 条评论
Fundamentals of BERT - Bidirectional Encoders Representations from Transformers, Part-1

2024年6月11日

Fundamentals of BERT - Bidirectional Encoders Representations from Transformers, Part-1

In this article we will explore the fundamental concepts of BERT. Before we deep delve into this further it is highly…

1 条评论
Symmetric Quantization - Quantization of LLMs, Part-4

2024年6月5日

Symmetric Quantization - Quantization of LLMs, Part-4

In part-3, we explored the concept of Affine Quantization. In this part we will focus on Symmetric Quantization.
Fundamentals of Quantization - Quantization of LLMs, Part-3

2024年5月27日

Fundamentals of Quantization - Quantization of LLMs, Part-3

In the first part (part-1), we observed that the majority of central processing units (CPUs) employ the 2’s complement…
BigNum Arithmetic - Quantization of LLMs, Part-2

2024年5月15日

BigNum Arithmetic - Quantization of LLMs, Part-2

We saw in part-1, that most central processing units (CPUs) utilize the 2’s complement to represent integers. In this…

1 条评论
METEOR - Evaluation of Large Language Models Part-4a

2024年5月7日

METEOR - Evaluation of Large Language Models Part-4a

In 2005, Alon Lavie and Satanjeev Banerjee created METEOR with the goal of surpassing BLEU and ROUGE through the…

2 条评论

See all articles

Number System - Quantization of LLMs, Part-1

Akash K.