Vector Compression: A Comparative Analysis
Co-Authored by : Ashish vajrapu Rahul Pentamsetty
Vector Compression is a technique used to minimize the size of vectors while retaining as much of their original information as possible. For example, imagine you have a large, high-quality photo that you want to send over the internet quickly. By compressing the photo, you reduce the file size, making it faster to send while maintaining good quality. Similarly, in vector compression, we condense large sets of numbers (vectors) while preserving the most crucial information. This efficiency makes it easier to process and store these numbers without requiring significant space.
We explored various techniques for implementing vector compression, and here are our findings:
Narrowed Data Types:
Narrowing data types involves assigning smaller, more precise primitive data types to the numbers within vector embeddings. Typically, embeddings with 1536 dimensions are stored using the Float32 data type, which is sizeable. By switching to Int8 with 1024 dimensions or Float16 with 1536 dimensions, we can significantly reduce the size of these vectors.
Scalar Quantization Compression:
Scalar quantization compresses data by mapping continuous or large sets of values to smaller sets of discrete values. This method effectively reduces the memory and storage requirements by decreasing the number of bits required to represent each value.
领英推荐
Dataset and Implementation:
For our analysis, we considered a dataset of approximately 900 pages from a Microsoft SQL Server administration PDF. We implemented four different vector stores to compare between compression techniques:
The following table outlines the retrieval and uploading times along with the index sizes for each method:
The retrieval times for the narrowed data types and scalar quantization are significantly lower than those for the original Float32 embeddings without compression, with reductions close to 50%. Additionally, the index sizes for the narrowed data types (Int8 and Float16) are much smaller compared to both the original and scalar quantized embeddings.
During our evaluation, while Embeddings with No Compression, Scalar Quantization, and narrowed datatype (Float16) retrieved similar documents, the narrowed datatype (Int8) showed a variation in the documents retrieved. This difference indicates that while smaller data types significantly enhance efficiency and reduce storage needs, they might slightly compromise the accuracy and quality of the output. Depending on the specific use case and the trade-offs between efficiency and accuracy, different compression techniques can be considered to determine the best fit for the scenario.
This approach allows for a strategic implementation of vector compression based on the priorities of retrieval speed, storage cost, and retrieval quality, enabling more informed decision-making in the deployment of these methodologies.