登录查看更多内容

Vector Database

Ishika Garg

Consultant - AI/ML Developer | Genpact

发布日期: 2024年7月4日

In the world of databases, we’re all familiar with traditional databases like RDBMS. But have you heard about vector databases? Unlike RDBMS, which provides exact matches based on specific conditions, a vector database finds the most similar items based on their semantic or contextual meaning. Let’s explore vector databases, as they are incredibly important if you’re working with machine learning.

A vector database is designed to handle high-dimensional data efficiently, making it perfect for large language models (LLMs). This is crucial for AI and machine learning, where understanding context and similarity is key.

The vector representations encode facts and commonsense concepts that may not be directly expressed in the LLM’s training data. For example — vector(“King”) — vector(“Man”) + to vector(“Queen”) in the vector space.

How we can use vector database –

Initially, we utilize the embedding model to generate vector embeddings for the content.
These vector embeddings are then inserted into the vector database.
When a user or application issues a query, we employ the same embedding model to create embeddings for the query. These embeddings are then used to search the database for similar vector embeddings.
Finally, these similar vector embeddings are forwarded to the LLM model for further processing.

Here are a few similarity measures –

Cosine Similarity — Cosine similarity measures the cosines of the angle between 2 vectors in a vector space. It ranges from -1 to 1, where 1 represents identical vectors, 0 represents orthogonal vectors and -1 represents vectors that are diametrically opposed.

2. Euclidean Distance — Euclidean distance measures the straight-line distance between 2 vectors in a vector space. It ranges from 0 to infinity, where 0 represents identical vectors and larger values represent increasingly dissimilar vectors.

3. Jaccard Similarity — Jaccard similarity is used for measuring the similarity between vectors. It is determined by comparing their shared elements to their total elements.

Following are some of the vector databases –

FAISS (Facebook AI Similarity Search) — Developed by Facebook AI, FAISS is a library designed to efficiently search and manage large collections of high-dimensional vectors, making it ideal for tasks such as image and text similarity search.
Pinecone — Pinecone is a managed vector database service that offers real-time vector similarity search.
Chroma — Chroma is a vector database that focuses on providing a flexible and scalable solution for storing and querying vector embeddings.

Fact—

A lot of venture capitalists are investing in various vector databases because they have realized that to build a successful LLM model, you need a robust vector database with very low latency that can easily perform numerous tasks for customers.

References —

https://www.pinecone.io/learn/vector-database/

Finally —

Hopefully, you enjoyed reading it. This was an introduction to Vector Store. Buckle up, because our next blog is going to be EPIC!

Got questions? Don’t be shy! Hit me up on LinkedIn. Coffee’s on me (virtually, of course) ??

Please feel free to contact me: Medium, Linkedin.

Harpreet Kaur

Immediate joiner | SOC Analyst | #Open to work | Basics of Networking and Cyber Security

8 个月

Very informative

1 次回应

Adarsh Srivastav

SDE @Amazon | Data Structures and Algorithms | Java | AWS

8 个月

Thank for sharing such information

1 次回应

Heena Goyal

Site Reliability Engineer

8 个月

Good to know!

1 次回应

Anmol Singh Chaudhary

8 个月

Love this

1 次回应

Harpreet Kaur

Immediate joiner | SOC Analyst | #Open to work | Basics of Networking and Cyber Security

8 个月

Useful tips

1 次回应

查看更多评论

要查看或添加评论，请登录

Ishika Garg的更多文章

SVD — Single Value Decomposition

2025年1月9日

SVD — Single Value Decomposition

Today, we embark on an exciting journey into the world of Singular Value Decomposition (SVD) — a fundamental concept in…

8 条评论
Linear Regression

2024年8月16日

Linear Regression

Today, we’re diving into the math behind one of the most fundamental models in machine learning: linear regression…

12 条评论
RAG

2024年7月18日

RAG

RAG stands for Retrieval-Augmented Generation. It’s a game-changer when working with LLMs.

6 条评论
Transformers

2024年6月7日

Transformers

We’re exploring the realm of Deep Learning, focusing on the pivotal role that “transformers” play in driving…

23 条评论
LLM Models

2024年5月31日

LLM Models

LLMs are a category of foundation models trained on large amounts of data (such as books, articles, etc.), enabling…

14 条评论
Foundation Model

2024年5月23日

Foundation Model

FOUNDATION MODEL is a versatile machine learning model that has been pre-trained on a vast amount of unlabelled, and…

6 条评论

See all articles

Ishika Garg的更多文章

SVD — Single Value Decomposition

Linear Regression

RAG

Transformers

LLM Models

Foundation Model