登录查看更多内容

Binary Quantization

Rohan Paul

I build & write AI stuff. → Join 46K+ others on my X / Twitter. AI Engineer and Entrepreneur (Ex Investment Banking).

发布日期: 2024年4月7日

The buzz surrounding Binary Quantization has been impossible to ignore, especially if you've been keeping tabs on recent discussions in tech circles. ??

The concept itself isn't new, but what's reignited interest is the announcement from Cohere regarding their latest support enhancements for int8 and binary embeddings in their Cohere embed v3.

?? First let's quickly see why we need Embeddings?

Embeddings are one of the most versatile tools in natural language processing, supporting a wide variety of settings and use cases. In essence, embeddings are numerical representations of more complex objects, like text, images, audio, etc. Specifically, the objects are represented as n-dimensional vectors.

After transforming the complex objects, you can determine their similarity by calculating the similarity of the respective embeddings! This is crucial for many use cases: it serves as the backbone for recommendation systems, retrieval, one-shot or few-shot learning, outlier detection, similarity search, paraphrase detection, clustering, classification, and much more.

?? Binary Quantization for embeddings

Unlike quantization in models where you reduce the precision of weights, quantization for embeddings refers to a post-processing step for the embeddings themselves. In particular, binary quantization refers to the conversion of the float32 values in an embedding to 1-bit values, resulting in a 32x reduction in memory and storage usage.

? Binary quantization example

Vector embeddings are usually generated by embedding models, such as Cohere’s embed v3, and a single vector embeddings will in the following form.

[0.056, -0.128, -0.029, 0.047, …, 0.135]

To quantize float32 embeddings to binary, we simply threshold normalized embeddings at 0

That is, because these embeddings have very small absolute numbers close to zero, you can turn them into a binary vector:

1: If the value is greater or equal to 0.

0: If the value is smaller than 0.

So that you get something like this.

[1, 0, 0, …, 1]

Here an example of Binary quantization with sentence transformers.

?? So basically why does binary quantization reduce vector embedding size soo much?

It's kind of like turning a colored image into a black and white image.

Phil Fersht 1 年前

Voxel51 Filtered Views Newsletter - August 23, 2024

Voxel51 2 个月前

??Top ML Papers of the Week

DAIR.AI 1 年前

By converting the floating point numbers, which are stored in 32 bits, into a single bit, you only need 1/32nd of memory space to store a binarized vector. This can lead to increased search speed and reduced storage costs.

And because vector embeddings are usually high-dimensional, you can still get meaningful similarity measures for vector search. ??

Now the question is how to calculate the similarity of vectors which has been binarized ?

?? We can use the Hamming Distance to efficiently perform retrieval with these binary embeddings. This is simply the number of positions at which the bits of two binary embeddings differ. The lower the Hamming Distance, the closer the embeddings, and thus the more relevant the document. A huge advantage of the Hamming Distance is that it can be easily calculated with 2 CPU cycles, allowing for blazingly fast performance.

?? Why Binary Quantization (BQ) is particularly suitable for high-dimensional vectors.

Simply because, in higher dimensional space, even with BQ, the vector can retain a high degree of information.

First, noting the basics, the number of elements in a single vector represents the total dimensionality of that vector. Each element of a vector represents a coordinate in a particular dimension, so a vector with n elements is said to inhabit an n-dimensional space.

When we refer to a vector's dimensionality, we are essentially describing how many degrees of freedom or independent directions of information it contains. For example, a 3-dimensional vector might represent a point in 3D space with coordinates along the X, Y, and Z axes.

?? In high-dimensional spaces, vectors possess a large number of elements. Despite each element being aggressively quantized to a single bit, the overall vector retains substantial aggregate information. The high dimensionality ensures that, even in binary form, the relationships and structures inherent to the data can be preserved to a useful extent.

?? This is on the assumption that the essential information of the vector is distributed across its many dimensions, allowing the binary-reduced vector to approximate the original's informational content in aggregate, despite the severe reduction in precision per dimension.

? What are the drawbacks ?

Firstly, the adoption of binary quantization impacts the accuracy and precision of your search results. Although you can still retrieve relevant outcomes, the nuance and detail provided by higher-resolution data can be lost, leading to less precise results.

Furthermore, binary quantization is a one-way street—once you've converted your data into binary form, there's no turning back. This process is a form of lossy compression, meaning once the data has undergone quantization, the original, detailed information is irretrievably lost.

? Is there anyway to reduce the Lossiness from BQ

Party yes, e.g. Weaviate compensates for this by overfetching vectors from the index, and then rescoring the vectors in the uncompressed space. And this technique has been found to compensate to some extent for the lossiness of BQ

Weaviate compensates for lossiness of BQ by overfetching vectors from the index, and then rescoring the vectors

要查看或添加评论，请登录

Rohan Paul的更多文章

Long Term Memory : The Foundation of AI Self-Evolution

2024年11月13日

Long Term Memory : The Foundation of AI Self-Evolution

?? Very interesting paper from the Tianqiao and Chrissy Chen Institute (TCCI ) that takes AI Long-Term Memory to the…

1 条评论
Production-Grade LLM Applications that React to Your Data

2024年7月1日

Production-Grade LLM Applications that React to Your Data

???? One of the greatest challenges of Large Language based applications is how to enable them to adapt to their…

1 条评论
Low code LLM Agents with Pre-build RAG Pipeline - Introducing Lyzr

2024年5月13日

Low code LLM Agents with Pre-build RAG Pipeline - Introducing Lyzr

?? In contrast to Gen AI, "agentic" AI is where the business value is. We are at a stage where Large Language Models…
Fastest way to finetune and deploy Large Language Model without writing any code

2024年5月7日

Fastest way to finetune and deploy Large Language Model without writing any code

Just recently, a finetuned version of Gemma-2B model from Google outperformed LLaMA 13B on Mathematics reasoning. ?…
Launch your RAG powered ChatBot in Minutes Using MonsterAPI's no-code platform

2024年4月1日

Launch your RAG powered ChatBot in Minutes Using MonsterAPI's no-code platform

Retrieval-Augmented Generation (RAG) and businesses are a match made in heaven. ?? RAG is a technique for enhancing the…
Low-cost & low-complexity LLM Deployment with Monster Deploy

2024年2月15日

Low-cost & low-complexity LLM Deployment with Monster Deploy

?? Thinking of deploying a popular Large Language Model (LLM) or a custom fine-tuned one, in production with low-cost…
Fine-tune a Large Language Model (LLM) and deploy it on MonsterAPI's no-code platform

2024年1月24日

Fine-tune a Large Language Model (LLM) and deploy it on MonsterAPI's no-code platform

The best part - No coding required and cost less than a cup of coffee! ?? Monster API designed their no-code LLM…
Handling the classic parenthesis matching problem in JavaScript with ES6

2018年8月15日

Handling the classic parenthesis matching problem in JavaScript with ES6

A classic problem — Check for balanced parentheses in an expression. Two brackets are considered to be a matched pair…

1 条评论

See all articles

Binary Quantization

Rohan Paul

I build & write AI stuff. → Join 46K+ others on my X / Twitter. AI Engineer and Entrepreneur (Ex Investment Banking).

领英推荐

Rohan Paul的更多文章

社区洞察

其他会员也浏览了

Transformer Architectures for Dummies - Part 2 (Decoder Only Architectures)

Understanding the Core Components of LLMs: Vectors, Tokens, and Embeddings Explained

The Future of AI: The Advancements on the Horizon

Exploring the Power of Self-Refine Prompting in AI

Unleashing the Power of LLMs by Bringing in Customization & Flexibility in Hardware

“Modal hints” for ManaGPT: Better AI text generation through prompts employing the language of possibility, probability, and necessity

[book] title = "GPT Prompt Fundamentals”

Unlocking the Enigma: Is GPT-4 the Gateway to True Artificial General Intelligence?

领英推荐

Rohan Paul的更多文章

Long Term Memory : The Foundation of AI Self-Evolution

Production-Grade LLM Applications that React to Your Data

Low code LLM Agents with Pre-build RAG Pipeline - Introducing Lyzr

Fastest way to finetune and deploy Large Language Model without writing any code

Launch your RAG powered ChatBot in Minutes Using MonsterAPI's no-code platform

Low-cost & low-complexity LLM Deployment with Monster Deploy

Fine-tune a Large Language Model (LLM) and deploy it on MonsterAPI's no-code platform

Handling the classic parenthesis matching problem in JavaScript with ES6

社区洞察

其他会员也浏览了

Transformer Architectures for Dummies - Part 2 (Decoder Only Architectures)

Understanding the Core Components of LLMs: Vectors, Tokens, and Embeddings Explained

The Future of AI: The Advancements on the Horizon

Exploring the Power of Self-Refine Prompting in AI

Unleashing the Power of LLMs by Bringing in Customization & Flexibility in Hardware

“Modal hints” for ManaGPT: Better AI text generation through prompts employing the language of possibility, probability, and necessity

[book] title = "GPT Prompt Fundamentals”

Unlocking the Enigma: Is GPT-4 the Gateway to True Artificial General Intelligence?