Beyond GenAI: What Is A Vector Database, And Why Do You Need One?

Beyond GenAI: What Is A Vector Database, And Why Do You Need One?

In the current era of technological advancement, the surge of Large Language Models (LLMs) such as GPT, Gemini, Claude, and Dolly has ushered in a new wave of innovation within the tech sector GenAI. Entrepreneurs and product developers are keenly exploring avenues to embed these potent LLMs into their products or to forge entirely new applications from scratch. However, the process of developing AI applications powered by LLMs is a far cry from the straightforward interaction experienced with platforms like ChatGPT. The journey is fraught with complexities and challenges that require careful navigation. This article aims to shed light on some of the prevalent issues encountered during the productization of LLMs and the deployment of LLM-backed applications, alongside proposing viable solutions to these challenges.

Integrating LLMs into products involves tackling issues ranging from managing intra-conversation memories in chatbots to enhancing long-term memory capabilities using vector databases for advanced question-answering and semantic search. Developers must also navigate the intricacies of output formatting to optimize token usage, implement caching strategies for LLM responses to scale effectively, and the deployment of LLMs in on-premise environments. This discussion ventures into the technicalities of these issues, offering insights into practical implementation strategies.

Vector databases have seen a surge in popularity, driven by the demands of generative AI. The underlying concept of vector embeddings, which distill complex information into a more manageable form, has been a cornerstone of data processing techniques for years, particularly in fields like image classification. In these applications, neural networks extract "vector embeddings" that encapsulate key features of images. Similarly, for text-based models, vector embeddings help capture the nuanced relationships between words, enabling a deeper understanding of language. These embeddings, once created, can be efficiently stored in vector databases for subsequent retrieval and analysis.


LLM Embeddings Explained

LLMs are trained on vast corpora of text, learning word representations based on contextual usage. This training process yields high-dimensional vector representations for words, encapsulating semantic meanings. Such embeddings allow LLMs to cluster words with similar meanings closer in vector space, enhancing the model's ability to generalize across various language tasks. These foundational embeddings are crucial for initializing the first layer of LLMs like BERT, equipping them with the capability to discern intricate word relationships.

Transforming Text into Vector Embeddings

Taking the sentence "I want to adopt a puppy" as an example, each word is converted into a vector representation via pre-trained word embeddings. These vectors are then processed through the neural network architecture of the language model, allowing the model to interpret the collective meaning of the sentence. The final output is a comprehensive vector embedding of the entire sentence, representing its semantic significance.

1.????? Each word in the sentence is mapped to its corresponding vector representation using the pre-trained word embeddings. For example, the word “adopt” may map to a 300-dimensional vector, “puppy” to another 300-dim vector, and so on.

2.????? The sequence of word vectors is then passed through the neural network architecture of the language model.

3.????? As the word vectors pass through the model, they interact with each other and get transformed by mathematical functions. This allows the model to interpret the meaning of the full sequence.

4.????? The output of the model is a new vector that represents the embedding for the full input sentence. This sentence embedding encodes the semantic meaning of the entire sequence of words.

Many closed-source models like?text-embedding-ada-002?from OpenAI and the?embeddings model?from Cohere allow developers to convert raw text into vector embeddings. It’s important to note that the models used to generate vector embeddings are NOT the same models used for text generation.

Embeddings vs Text Generation

  • For NLP, embeddings are trained on a language modeling objective. This means they are trained to predict surrounding words/context, not to generate text.
  • Embedding models are encoder-only models without decoders. They output an embedding, not generated text.
  • Generation models like GPT-3 have a decoder component trained explicitly for text generation.

Vector Databases: Enabling Efficient Search

The essence of vector databases lies in their ability to store and search through vector embeddings effectively. The performance, scalability, and cost-effectiveness of these databases are influenced by their core technologies. They support various metrics for measuring the similarity or distance between vectors, facilitating efficient search operations.

Calculating the distance between vectors

Most vector databases support 3 main distance metrics:

·?????? Euclidean distance: the straight line distance between two points in the vector space

·?????? Cosine similarity: the cosine of the angle between two vectors – the larger the cosine, the closer the vectors

·?????? Dot product: product of cosine similarity and the magnitudes (lengths) of the vectors – the larger the dot product, the closer the vectors


Indexing Vector DataBase

Even though vector databases can contain metadata in the form of JSON objects, the primary type of data is?vectors. Vector databases optimize operations to make reading and writing vectors as fast as possible. With vector databases, there are two different concepts of?indexing?and?search?algorithms, both of which contribute to the overall performance. In many situations, choosing a vector index involves a trade-off between accuracy (precision/recall) and speed/throughput.?Two primary factors help organize an index:

1.????? The underlying data structure

2.????? Level of compression

Hash-based Indexing

Locality-Sensitive Hashing (LSH)?used hash functions to bucket similar vectors into a hash table. The query vectors are also hashed using the same hash function and it is compared with the other vectors already present in the table.

This method is much faster than doing an exhaustive search across the entire dataset because there are fewer vectors in each hash table than in the whole vector space. While this technique is quite fast, the downside is that it is not very accurate. LSH is an approximate method, so a better hash function will result in a better approximation, but the result will not be the exact answer.

Tree-based Indexing

Tree-based indexing allows for fast searches by using a data structure such as a binary tree. The tree gets created in a way that similar vectors are grouped in the same subtree.?spotify/annoy?(Approximate Nearest Neighbour Oh Yeah) uses a forest of binary trees to perform approximate nearest neighbors search. Annoy performs well with high-dimension data where doing an exact nearest neighbors search can be expensive. The downside of using this method is that it can take a significant amount of time to build the index. Whenever a new data point is received, the indices cannot be restructured on the fly. The entire index has to be rebuilt from scratch.

Graph-based Indexing

Similar to tree-based indexing, graph-based indexing groups similar data points by connecting them with an edge. Graph-based indexing is useful when trying to search for vectors in a high-dimensional space.?HNSW (Hierarchical Navigable Small World)?is a popular graph-based index that is designed to provide a balance between search speed and accuracy.

HNSW creates a layered graph with the topmost layer containing the fewest points and the bottom layer containing the most points? When an input query comes in, the topmost layer is searched via?ANN. The graph is traversed downward layer by layer. At each layer, the ANN algorithm is run to find the closest point to the input query. Once the bottom layer is hit, the nearest point to the input query is returned.

Graph-based indexing is very efficient because it allows one to search through a high-dimensional space by narrowing down the location at each layer. However, re-indexing can be challenging because the entire graph may need to be recreated?

Navigating the Challenges of Hybrid Search

·?????? Keyword Search:?This is the age-old method we’re most familiar with. Input a word or a phrase, and this search hones in on those exact terms or closely related ones in the database or document collection. BM25 is a widely used algorithm for text search that calculates a score based on the term frequency and inverse document frequency of each term in the query. HNSW, as we saw in the previous section, is an algorithm for approximate nearest neighbor search that constructs a small world graph of interconnected nodes. By combining these two algorithms, we can perform hybrid search that combines the strengths of both.

·?????? Vector Search:?Unlike its counterpart, vector search isn’t content with mere words. It ventures into the realm of understanding, aiming to discern the query’s underlying context or meaning. This ensures that even if your words don’t match a document exactly, if the meaning is relevant, it’ll be fetched.

One of the biggest challenges of hybrid search is balancing the weights of the two algorithms. In other words, we need to decide how much weight to give to the BM25 score and how much weight to give to the HNSW score when combining them. This can be tricky, as the optimal weights may vary depending on the data and the specific search scenario.

However, when done correctly, hybrid search can lead to significant improvements in search accuracy and efficiency. For example, in e-commerce applications, hybrid search can be used to combine text search with visual search, enabling users to find products that match both their textual and visual queries. In scientific applications, hybrid search can be used to combine text search with similarity search on high-dimensional data, enabling researchers to find relevant documents based on both their textual content and their data.

The?research?shows that hybrid search performs better on relevance compared to standalone keyword and semantic search.

Scalability

Scalability describes a system’s ability to grow. Figure out if there’s a limit to the number of vector embeddings the DB provider can support and how you can scale.

Most vector DBs allow you to scale both horizontally and vertically. Horizontal scaling means adding resources to the existing system (scaling up), while horizontal scaling involves adding additional servers (scaling out). Each option has its pros and cons and needs to be evaluated case by case, but both require manual actions.

In a perfect state, you can scale automatically and don’t need to worry about how to scale at all because it’s all taken care of.

Cost-efficiency

A great system delivers satisfactory speed and accuracy at a reasonable price, not only for a small application but also when you scale to billions of embeddings. Estimate the number of embeddings you want to scale to, and ask your vendor what speed, accuracy, and price they can offer with your embedding number.

?

Raúl López Casado

BUSINESS DATA & AI STRATEGY (CDSO) (CAISO) | DIGITAL TRANSFORMATION & INNOVATION (CDO & CIO) | DATA & ANALYTICS | CYBERSECURITY | PARTNERS & ALLIANCES.

8 个月

Increíble Celia Lozano Grijalba súper interesante ! Gracias !

Javier De Castro Martinez [LION]

Data Engineer Google- Certified, Data DevOps Google-Certified, Data Analytics AWS- Certified, Self-training AI with Google Cloud Skills Boost and AWS SkillBuilder. CyberVolunteer. Active Retiree.

8 个月

Excelente artículo, muchas gracias.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了