登录查看更多内容

Introduction to Weight Quantization

Arun Ohm

Gen AI Intern at BI Hub Solution | Passionate Learner | Problem solver | Machine Learning | Gen AI

发布日期: 2024年7月3日

One well-known drawback of large language models (LLMs) is their high computational overhead. A model's size is often determined by multiplying its size (number of parameters) by the accuracy of its values (data type). However, weights can be quantized a technique that uses lower-precision data types to hold information in order to conserve memory.

We distinguish two main families of weight quantization techniques in the literature:

Post-Training Quantization (PTQ)is a simple method that converts the weights of a model that has previously been trained to a lower precision without requiring retraining. PTQ is connected with potential performance reduction, despite its ease of implementation.
Quantization-Aware Training (QAT) improves model performance by incorporating the weight conversion procedure in the pre-training or fine-tuning phase. However, QAT requires representative training data and is computationally expensive.

Background on Floating Point Representation

The choice of data type dictates the quantity of computational resources required, affecting the speed and efficiency of the model. In deep learning applications, balancing precision and computational performance becomes a vital exercise as higher precision often implies greater computational demands.

Among various data types, floating point numbers are predominantly employed in deep learning due to their ability to represent a wide range of values with high precision. Typically, a floating point number uses n bits to store a numerical value. These n bits are further partitioned into three distinct components:

Sign: The sign bit indicates the positive or negative nature of the number. It uses one bit where 0 indicates a positive number and 1 signals a negative number.
Exponent: The exponent is a segment of bits that represents the power to which the base (usually 2 in binary representation) is raised. The exponent can also be positive or negative, allowing the number to represent very large or very small values.
Significand/Mantissa: The remaining bits are used to store the significand, also referred to as the mantissa. This represents the significant digits of the number. The precision of the number heavily depends on the length of the significand.

This design allows floating point numbers to cover a wide range of values with varying levels of precision. The formula used for this representation is:

领英推荐

Is DeepSeek R1 Right for Your Business?

Plain Concepts 3 周前

From prompt magic to prompt engineering?

Gantry 2 年前

How will LLMs impact Data Scientists?

Michael Spencer 1 年前

To understand this better, let’s delve into some of the most commonly used data types in deep learning: float32 (FP32), float16 (FP16), and bfloat16 (BF16):

FP32 uses 32 bits to represent a number: one bit for the sign, eight for the exponent, and the remaining 23 for the significand. While it provides a high degree of precision, the downside of FP32 is its high computational and memory footprint.
FP16 uses 16 bits to store a number: one is used for the sign, five for the exponent, and ten for the significand. Although this makes it more memory-efficient and accelerates computations, the reduced range and precision can introduce numerical instability, potentially impacting model accuracy.
BF16 is also a 16-bit format but with one bit for the sign, eight for the exponent, and seven for the significand. BF16 expands the representable range compared to FP16, thus decreasing underflow and overflow risks. Despite a reduction in precision due to fewer significand bits, BF16 typically does not significantly impact model performance and is a useful compromise for deep learning tasks.Na?ve 8-bit Quantization

In this section, we will implement two quantization techniques: a symmetric one with absolute maximum (absmax) quantization and an asymmetric one with zero-point quantization. In both cases, the goal is to map an FP32 tensor X (original weights) to an INT8 tensor X_quant (quantized weights).

With absmax quantization, the original number is divided by the absolute maximum value of the tensor and multiplied by a scaling factor (127) to map inputs into the range [-127, 127]. To retrieve the original FP16 values, the INT8 number is divided by the quantization factor, acknowledging some loss of precision due to rounding.

For instance, let’s say we have an absolution maximum value of 3.2. A weight of 0.1 would be quantized to round(0.1 × 127/3.2) = 4. If we want to dequantize it, we would get 4 × 3.2/127 = 0.1008, which implies an error of 0.008.

AI Resonance in the Future

536 位关注者

要查看或添加评论，请登录

Arun Ohm的更多文章

What are the Transformers?

2024年10月22日

What are the Transformers?

With all the buzz around Generative AI Tools like ChatGPT, Gemini, DALL-E2, AlphaCode, etc, that uses Large Language…

2 条评论
Designing the Future: How GenAI is Transforming Interface Design

2024年10月4日

Designing the Future: How GenAI is Transforming Interface Design

Introduction An era of ubiquitous computers and intelligent interfaces is upon us. User interface (UI) design is…

1 条评论
Machine Learning Operations (MLOps) For Beginners

2024年9月26日

Machine Learning Operations (MLOps) For Beginners

It can be difficult and complex to develop, implement, and maintain machine learning models in a production setting…

2 条评论
Quantization of LLMs

2024年6月5日

Quantization of LLMs

Memory requirements for large language models (LLMs) can be high, particularly for large models such as Mixtral 8x7b…
Unlocking the Secrets of RESTful APIs: A Comprehensive Manual

2024年3月21日

Unlocking the Secrets of RESTful APIs: A Comprehensive Manual

The sharing of data between apps and services has become essential to our everyday lives in the digital age…

1 条评论
AI plays in revolutionizing social media marketing strategies.

2024年3月4日

AI plays in revolutionizing social media marketing strategies.

Social media has fundamentally transformed the way we connect with others and share our experiences, emerging as an…
Unveiling the Future of Video Creation with Text Prompts

2024年2月20日

Unveiling the Future of Video Creation with Text Prompts

Introduction: Buckle up, the world of video creation is about to undergo a dramatic shift. OpenAI’s recently unveiled…
Starbucks and the Magic of Artificial Intelligence: Improving Coffee and Customer Experience

2024年2月16日

Starbucks and the Magic of Artificial Intelligence: Improving Coffee and Customer Experience

Prelude: Starbucks, the well-known coffee brand, has been employing Artificial Intelligence (AI) to improve its…

1 条评论
Prospects of Data Visualization in 2024 and?Upward

2024年1月26日

Prospects of Data Visualization in 2024 and?Upward

Future of Data Visualization: 2024 and Beyond The area of data visualization is about to undergo a revolutionary change…

See all articles

Introduction to Weight Quantization

Arun Ohm

Gen AI Intern at BI Hub Solution | Passionate Learner | Problem solver | Machine Learning | Gen AI

Background on Floating Point Representation

领英推荐

AI Resonance in the Future

536 位关注者

Arun Ohm的更多文章

社区洞察

其他会员也浏览了

Non-supervised AI for SMEs: Infrastructure is more than just roads.

How Underspecification Poses Difficulties for ML | Infogen Labs

How Underspecification Poses Difficulties for Machine Learning

What is artificial intelligence (AI)?

DeepSeek: The AI Disruptor Transforming the LLM Industry

What are the main differences between artificial intelligence and machine learning? Is machine learning a part of artificial intelligence?

ARTIFICIAL INTELLIGENCE VS. MACHINE LEARNING ALGORITHMS

The LLMOps Lifecycle: Managing Large Language Models Effectively

LoRA vs. QLoRA: Efficient Techniques for Fine-Tuning LLMs

How do you run Deepseek (or any LLM) locally on your PC?

Background on Floating Point Representation

领英推荐

AI Resonance in the Future

536 位关注者

Arun Ohm的更多文章

What are the Transformers?

Designing the Future: How GenAI is Transforming Interface Design

Machine Learning Operations (MLOps) For Beginners

Quantization of LLMs

Unlocking the Secrets of RESTful APIs: A Comprehensive Manual

AI plays in revolutionizing social media marketing strategies.

Unveiling the Future of Video Creation with Text Prompts

Starbucks and the Magic of Artificial Intelligence: Improving Coffee and Customer Experience

Prospects of Data Visualization in 2024 and?Upward

社区洞察

其他会员也浏览了

Non-supervised AI for SMEs: Infrastructure is more than just roads.

How Underspecification Poses Difficulties for ML | Infogen Labs

How Underspecification Poses Difficulties for Machine Learning

What is artificial intelligence (AI)?

DeepSeek: The AI Disruptor Transforming the LLM Industry

What are the main differences between artificial intelligence and machine learning? Is machine learning a part of artificial intelligence?

ARTIFICIAL INTELLIGENCE VS. MACHINE LEARNING ALGORITHMS

The LLMOps Lifecycle: Managing Large Language Models Effectively

LoRA vs. QLoRA: Efficient Techniques for Fine-Tuning LLMs

How do you run Deepseek (or any LLM) locally on your PC?