Qdrant
Qdrant is an open-source, fully managed vector database and vector similarity search engine that allows users to:?
- Store, search, and manage vector embeddings?
- Add payloads to vectors to help refine searches and provide useful information to users?
Qdrant offers a production-ready service with an API. It's designed for massive-scale use and is considered high-performance. Vector Databases have become the go-to place for storing and indexing the representations of unstructured and structured data. These representations are the vector embeddings generated by the Embedding Models. The vector stores have become an integral part of developing apps with Deep Learning Models, especially the Large Language Models. In the ever-evolving landscape of Vector Stores, Qdrant is one such Vector Database that has been introduced recently and is feature-packed.
Embeddings
Vector Embeddings are a means of expressing data in numerical form—that is, as numbers in an n-dimensional space, or as a numerical vector—regardless of the type of data—text, photos, audio, videos, etc. Embeddings enable us to group together related data in this way. Certain inputs can be transformed into vectors using certain models. A well-known embedding model created by Google that translates words into vectors (vectors are points with n dimensions) is called Word2Vec. Each of the Large Language Models has an embedding model that generates an embedding for the LLM.
Embeddings Used for
One advantage of translating words to vectors is that they allow for comparison. When given two words as numerical inputs, or vector embeddings, a computer can compare them even though it cannot compare them directly. It is possible to group words with comparable embeddings together. Because they are related to one another, the terms King, Queen, Prince, and Princess will appear in a cluster.
In this sense, embeddings help us locate words that are related to a given term. This can be used in sentences, where we enter a sentence, and the supplied data returns related sentences. This serves as the foundation for numerous use cases, including chatbots, sentence similarity, anomaly detection, and semantic search. The Chatbots that we develop to answer questions based on a PDF or document that we provide make use of this embedding notion. This method is used by all Generative Large Language Models to obtain content that is similarly connected to the queries that are supplied to them.
Know the Qdrant Terminology
To get a smooth start with Qdrant, it’s a good practice to get familiar with the terminology / the main Components used in the Qdrant Vector Database.
领英推è
Collections
Collections are named sets of Points, where each Point contains a vector and an optional ID and payload. Vectors in the same Collection must share the same dimensionality and be Evaluated with a single chosen Metric.
Distance Metrics
Essential for measuring how close are the vectors to each other, distance metrics are selected during the creation of a Collection. Qdrant provides the following Distance Metrics: Dot, Cosine, and Euclidean.
Points
The fundamental entity within Qdrant, points consists of a vector embedding, an optional ID, and an associated payload, where id: A unique identifier for each vector embedding vector:?A high-dimensional representation of data, which can be either structured or unstructured formats like images, text, documents, PDFs, videos, audio, etc. payload: An optional JSON object containing data associated with a vector. This can be considered similar to metadata, and we can work with this to filter the search process
Storage
Qdrant provides two storage options:
- In-Memory Storage: Stores all vectors in RAM, optimizing speed by minimizing disk access to persistence tasks.
- Memmap Storage: Creates a virtual address space linked to a file on disk, balancing speed and persistence requirements.