Journey To Database World: Part 10 (Vector Database - Qdrant As Example)
Saiful Islam Rasel
Senior Engineer, SDE @ bKash | Ex: AsthaIT | Sports Programmer | Problem Solver | FinTech | Microservice | Java | Spring-boot | C# | .NET | PostgreSQL | DynamoDB | JavaScript | TypeScript | React.js | Next.js | Angular
Story:
In a university, there was a magical library called The Vector Vault. Instead of books, it stored colorful stars that captured the essence of each book, like its meaning or feel.
Students didn’t ask for exact titles, they described what they wanted, like “a story about friendship” or showed something similar. The librarian, used a magical tool to compare stars and find the closest matches, whether it was about stories, pictures, or songs.
But the library wasn’t perfect. It couldn’t fetch exact titles or page numbers, it was only great at finding things like what you wanted.
That’s what a vector database does: it helps find meaningful connections in data, perfect for discovering similarities, but not for precise details.
What is a Vector?
A vector is a mathematical representation of an entity or object in the form of a numerical array (a list of numbers). These numbers, called dimensions, capture the features or attributes of the object in a way that preserves its relationships or similarity to other objects. For example text, audio, video, image etc. data can be expressed as a array of numbers like [0.12, 0.45, 0.88, 0.34].
Key Characteristics of a Vector in a Vector Database:
Use of vector:
Vectors are used because they allow for: Similarity Measurements, Dimensionality Reduction, Machine Learning Compatibility.
What is a Vector Database?
A vector database is a special type of database designed to store and manage data as vectors. Vectors are numeric representations of data, often generated by machine learning models to represent things like text, images, or audio in a way that captures their meaning or similarity. These databases excel at finding similar vectors, which makes them great for applications like searching or comparing complex data. Example: Qdrant, Pinecone etc.
Use Cases of Vector Databases
Benefits of Vector Databases
Drawbacks of Vector Databases
When to Use a Vector Database
When Not to Use a Vector Database
领英推荐
Traditional Vs Vector Database
Data Type: Structured (rows/columns) Vs High-dimensional vectors
Query Type: Exact match, range, aggregation Vs Similarity search
Use Cases: Structured, relational data Vs AI-driven tasks, embeddings
Indexing: B-trees, hash indexes Vs HNSW, PQ, IVF
Scalability: General-purpose Vs Optimized for large vectors
Performance: CRUD and analytics Vs Similarity search
AI Integration: External tools required Vs Built-in for ML workflows
Qdrant as a Vector Database
Qdrant is a high-performance, open-source vector database built specifically for similarity search and machine learning applications. It is designed for real-time retrieval of the nearest neighbors of a query vector. Qdrant provides scalable, fault-tolerant infrastructure with support for large-scale datasets and real-time analytics.
Key Features of Qdrant:
The diagram above represents a high-level overview of some of the main components of Qdrant. Here are the terminologies you should get familiar with.
Collections: A collection is a named set of points (vectors with a payload) among which you can search. The vector of each point within the same collection must have the same dimensionality and be compared by a single metric. Named vectors can be used to have multiple vectors in a single point, each of which can have their own dimensionality and metric requirements.
Distance Metrics: These are used to measure similarities among vectors and they must be selected at the same time you are creating a collection. The choice of metric depends on the way the vectors were obtained and, in particular, on the neural network that will be used to encode new queries.
Points: The points are the central entity that Qdrant operates with and they consist of a vector and an optional id and payload.
Storage: Qdrant can use one of two options for storage
Clients: the programming languages you can use to connect to Qdrant.
Query Operations: If you are interested further then you can check their official docs here.
Summary:
Vector databases are specialized tools for managing and searching unstructured data represented as vectors. They shine in AI-powered applications like recommendation systems, semantic search, and personalization. While they offer speed and scalability, their complexity and specific use cases mean they aren’t a fit for every scenario. Qdrant make it easier to leverage vector databases in modern applications, especially when dealing with large-scale machine learning models.
Previous Parts: