登录查看更多内容

Journey To Database World: Part 10 (Vector Database - Qdrant As Example)

Saiful Islam Rasel

Senior Engineer, SDE @ bKash | Ex: AsthaIT | Sports Programmer | Problem Solver | FinTech | Microservice | Java | Spring-boot | C# | .NET | PostgreSQL | DynamoDB | JavaScript | TypeScript | React.js | Next.js | Angular

发布日期: 2025年1月12日

Story:

In a university, there was a magical library called The Vector Vault. Instead of books, it stored colorful stars that captured the essence of each book, like its meaning or feel.

Students didn’t ask for exact titles, they described what they wanted, like “a story about friendship” or showed something similar. The librarian, used a magical tool to compare stars and find the closest matches, whether it was about stories, pictures, or songs.

But the library wasn’t perfect. It couldn’t fetch exact titles or page numbers, it was only great at finding things like what you wanted.

That’s what a vector database does: it helps find meaningful connections in data, perfect for discovering similarities, but not for precise details.

What is a Vector?

A vector is a mathematical representation of an entity or object in the form of a numerical array (a list of numbers). These numbers, called dimensions, capture the features or attributes of the object in a way that preserves its relationships or similarity to other objects. For example text, audio, video, image etc. data can be expressed as a array of numbers like [0.12, 0.45, 0.88, 0.34].

Key Characteristics of a Vector in a Vector Database:

High-Dimensional Data: Vectors enable comparing entities by calculating distances (e.g., cosine similarity, Euclidean distance) between them.
Numerical Representation: High-dimensional data can be abstracted into smaller, dense vector representations that are computationally efficient.
Embedding of Data: Vectors are the format that most machine learning and AI models use for input, processing, and predictions.

Use of vector:

Vectors are used because they allow for: Similarity Measurements, Dimensionality Reduction, Machine Learning Compatibility.

What is a Vector Database?

A vector database is a special type of database designed to store and manage data as vectors. Vectors are numeric representations of data, often generated by machine learning models to represent things like text, images, or audio in a way that captures their meaning or similarity. These databases excel at finding similar vectors, which makes them great for applications like searching or comparing complex data. Example: Qdrant, Pinecone etc.

In this simple vector database, the documents in the upper right are likely similar to each other.

Use Cases of Vector Databases

Recommendation Systems: Suggesting similar products, movies, or content based on user preferences.
Semantic Search: Finding relevant information by meaning instead of exact keywords.
Image and Video Search: Searching for images or videos based on visual similarity.
Natural Language Processing (NLP): Tasks like question answering, summarization, or chatbot responses.
Anomaly Detection: Identifying unusual patterns in data for cybersecurity, fraud detection, or system monitoring.
Personalization: Tailoring user experiences based on past behaviors or preferences.

Benefits of Vector Databases

Fast Similarity Searches: Optimized for comparing vectors to find the most similar ones.
Scalability: Can handle large amounts of vector data efficiently.
Flexibility: Works with unstructured data like text, images, or audio.
AI Integration: Perfect for use with machine learning models to enhance search and recommendation systems.

Drawbacks of Vector Databases

Complexity: Requires knowledge of machine learning and vector embeddings to use effectively.
Specialized Use: Not a replacement for traditional databases; suitable only for specific tasks.
Resource Intensive: Can demand significant computational power for storage and search.
Limited Ecosystem: Smaller community and fewer tools compared to traditional databases.

When to Use a Vector Database

You need similarity search for unstructured data like text, images, or audio.
AI is a core part of your system, such as recommendation engines or NLP applications.
You work with large-scale, unstructured data that cannot be handled well by traditional databases.

When Not to Use a Vector Database

For structured, relational data like rows and columns in a financial system.
If your application doesn’t involve machine learning models or vector embeddings.
For simple key-value or transactional operations that are better suited to traditional databases.

领英推荐

Vector Database Revolution - Chroma, Pinecone, and…

Xencia Technology Solutions 1 年前

Data Science, Business Intelligence, and Analytics:…

Pratibha Kumari J. 3 个月前

Data Science in Business

NioyaTech IT/AI Solutions 3 年前

Traditional Vs Vector Database

Data Type: Structured (rows/columns) Vs High-dimensional vectors

Query Type: Exact match, range, aggregation Vs Similarity search

Use Cases: Structured, relational data Vs AI-driven tasks, embeddings

Indexing: B-trees, hash indexes Vs HNSW, PQ, IVF

Scalability: General-purpose Vs Optimized for large vectors

Performance: CRUD and analytics Vs Similarity search

AI Integration: External tools required Vs Built-in for ML workflows

Qdrant as a Vector Database

Qdrant is a high-performance, open-source vector database built specifically for similarity search and machine learning applications. It is designed for real-time retrieval of the nearest neighbors of a query vector. Qdrant provides scalable, fault-tolerant infrastructure with support for large-scale datasets and real-time analytics.

Key Features of Qdrant:

Vector Search: Efficient nearest neighbor search using advanced indexing techniques like HNSW (Hierarchical Navigable Small World).
Hybrid Search: Combines traditional filters (like metadata) with vector similarity.
Payload Storage: Supports additional metadata (payload) for each vector.
Dynamic Updates: Supports real-time updates to vectors and payloads.
Multi-Tenant Support: Multiple collections can be managed in a single Qdrant instance.

The diagram above represents a high-level overview of some of the main components of Qdrant. Here are the terminologies you should get familiar with.

Collections: A collection is a named set of points (vectors with a payload) among which you can search. The vector of each point within the same collection must have the same dimensionality and be compared by a single metric. Named vectors can be used to have multiple vectors in a single point, each of which can have their own dimensionality and metric requirements.

Distance Metrics: These are used to measure similarities among vectors and they must be selected at the same time you are creating a collection. The choice of metric depends on the way the vectors were obtained and, in particular, on the neural network that will be used to encode new queries.

Points: The points are the central entity that Qdrant operates with and they consist of a vector and an optional id and payload.

id: a unique identifier for your vectors.
Vector: a high-dimensional representation of data, for example, an image, a sound, a document, a video, etc.
Payload: A payload is a JSON object with additional data you can add to a vector.

Storage: Qdrant can use one of two options for storage

In-memory storage (Stores all vectors in RAM, has the highest speed since disk access is required only for persistence),
Memmap storage, (creates a virtual address space associated with the file on disk).

Clients: the programming languages you can use to connect to Qdrant.

Query Operations: If you are interested further then you can check their official docs here.

Summary:

Vector databases are specialized tools for managing and searching unstructured data represented as vectors. They shine in AI-powered applications like recommendation systems, semantic search, and personalization. While they offer speed and scalability, their complexity and specific use cases mean they aren’t a fit for every scenario. Qdrant make it easier to leverage vector databases in modern applications, especially when dealing with large-scale machine learning models.

Previous Parts:

要查看或添加评论，请登录

Saiful Islam Rasel的更多文章

Book Review and Takeaways : ("Database Internals: A Deep Dive into How Distributed Data Systems Work")

2025年2月21日

Book Review and Takeaways : ("Database Internals: A Deep Dive into How Distributed Data Systems Work")

Recently I finished reading the book named "Database Internals: A Deep Dive into How Distributed Data Systems Work" by…

4 条评论
Journey To Database World: Part 12 (Database Internals)

2025年2月20日

Journey To Database World: Part 12 (Database Internals)

Database internals refer to how a database system works behind the scenes. Like how it store data in memory and disk…

2 条评论
Book Review and Takeaways : ("Engineers' Survival Guide: Advice, Tactics and Tricks")

2025年2月8日

Book Review and Takeaways : ("Engineers' Survival Guide: Advice, Tactics and Tricks")

Recently I finished reading the book named "Engineers' Survival Guide: Advice, Tactics and Tricks". As its sub title…
Book Review and Takeaways : ("The Phoenix Project - A Novel about IT, DevOps, and Helping Your Business Win")

2025年1月24日

Book Review and Takeaways : ("The Phoenix Project - A Novel about IT, DevOps, and Helping Your Business Win")

Recently I finished reading the book named "The Phoenix Project". As its sub title say, it is a novel about IT, DevOps…

4 条评论
Journey To Database World: Part 11 (Knowledge Should Have According to Role)

2025年1月18日

Journey To Database World: Part 11 (Knowledge Should Have According to Role)

Here, I'm trying to identify the roles at a low level. Roles and responsibilities may vary based on the needs of the…
Journey To Database World: Part 9 (Graph Database - Neo4j As Example)

2024年12月27日

Journey To Database World: Part 9 (Graph Database - Neo4j As Example)

Story: In a magical city called GraphLand, everything is connected in unique and meaningful ways. The city is home to…
Journey To Database World: Part 8 (Column Family Database - Cassandra As Example)

2024年12月24日

Journey To Database World: Part 8 (Column Family Database - Cassandra As Example)

Story: Imagine a flexible library where each bookshelf (column family) represents a category of data, such as "Science"…
Journey To Database World: Part 7 (Document Database - MongoDB As Example)

2024年12月20日

Journey To Database World: Part 7 (Document Database - MongoDB As Example)

Story: Once upon a time, there was a toy store owner named Toma. Toma had a lot of different toys to manage like action…

2 条评论
Journey To Database World: Part 6 (Key-Value Pair Database - Redis As Example)

2024年12月18日

Journey To Database World: Part 6 (Key-Value Pair Database - Redis As Example)

Story: Once upon a time, in the Kingdom of DataLand, there lived a wise wizard named Redis. Redis was known far and…

2 条评论
Journey To Database World: Part 5 (NoSQL Key-Value Pair Database - DynamoDB As Example)

2024年12月16日

Journey To Database World: Part 5 (NoSQL Key-Value Pair Database - DynamoDB As Example)

Story: Imagine you walk into a futuristic restaurant called "The Key-Value Restaurant". Instead of a traditional menu…

See all articles

Journey To Database World: Part 10 (Vector Database - Qdrant As Example)

Saiful Islam Rasel

Senior Engineer, SDE @ bKash | Ex: AsthaIT | Sports Programmer | Problem Solver | FinTech | Microservice | Java | Spring-boot | C# | .NET | PostgreSQL | DynamoDB | JavaScript | TypeScript | React.js | Next.js | Angular

Story:

What is a Vector?

What is a Vector Database?

Use Cases of Vector Databases

Benefits of Vector Databases

Drawbacks of Vector Databases

When to Use a Vector Database

When Not to Use a Vector Database

领英推荐

Traditional Vs Vector Database

Qdrant as a Vector Database

Summary:

Previous Parts:

Saiful Islam Rasel的更多文章

社区洞察

其他会员也浏览了

Terminologies in Data Science and Artificial Intelligence (AI)

Pioneering the Next Generation of Vector Databases

Understanding Vector Databases and Their Role in Embeddings

Data Science 101: An Introduction to the Fundamentals and Techniques

Generative AI + Databases & Vector Search: The Future of Intelligent Data Retrieval

Understanding Data Science vs Machine Learning for Business Innovation

Understanding Vector Indexing Strategies for Efficient Data Retrieval

Top 7 Vector Databases for AI

OpenLink Data Twingler AI Agent Example

The Future of Data: How Synthetic Data is Revolutionizing the Industry

Story:

What is a Vector?

What is a Vector Database?

Use Cases of Vector Databases

Benefits of Vector Databases

Drawbacks of Vector Databases

When to Use a Vector Database

When Not to Use a Vector Database

领英推荐

Traditional Vs Vector Database

Qdrant as a Vector Database

Summary:

Previous Parts:

Saiful Islam Rasel的更多文章

Book Review and Takeaways : ("Database Internals: A Deep Dive into How Distributed Data Systems Work")

Journey To Database World: Part 12 (Database Internals)

Book Review and Takeaways : ("Engineers' Survival Guide: Advice, Tactics and Tricks")

Book Review and Takeaways : ("The Phoenix Project - A Novel about IT, DevOps, and Helping Your Business Win")

Journey To Database World: Part 11 (Knowledge Should Have According to Role)

Journey To Database World: Part 9 (Graph Database - Neo4j As Example)

Journey To Database World: Part 8 (Column Family Database - Cassandra As Example)

Journey To Database World: Part 7 (Document Database - MongoDB As Example)

Journey To Database World: Part 6 (Key-Value Pair Database - Redis As Example)

Journey To Database World: Part 5 (NoSQL Key-Value Pair Database - DynamoDB As Example)

社区洞察

其他会员也浏览了

Terminologies in Data Science and Artificial Intelligence (AI)

Pioneering the Next Generation of Vector Databases

Understanding Vector Databases and Their Role in Embeddings

Data Science 101: An Introduction to the Fundamentals and Techniques

Generative AI + Databases & Vector Search: The Future of Intelligent Data Retrieval

Understanding Data Science vs Machine Learning for Business Innovation

Understanding Vector Indexing Strategies for Efficient Data Retrieval

Top 7 Vector Databases for AI

OpenLink Data Twingler AI Agent Example

The Future of Data: How Synthetic Data is Revolutionizing the Industry