Vector Database

In the world of databases, we’re all familiar with traditional databases like RDBMS. But have you heard about vector databases? Unlike RDBMS, which provides exact matches based on specific conditions, a vector database finds the most similar items based on their semantic or contextual meaning. Let’s explore vector databases, as they are incredibly important if you’re working with machine learning.


A vector database is designed to handle high-dimensional data efficiently, making it perfect for large language models (LLMs). This is crucial for AI and machine learning, where understanding context and similarity is key.

The vector representations encode facts and commonsense concepts that may not be directly expressed in the LLM’s training data. For example — vector(“King”) — vector(“Man”) + to vector(“Queen”) in the vector space.

How we can use vector database –

  1. Initially, we utilize the embedding model to generate vector embeddings for the content.
  2. These vector embeddings are then inserted into the vector database.
  3. When a user or application issues a query, we employ the same embedding model to create embeddings for the query. These embeddings are then used to search the database for similar vector embeddings.
  4. Finally, these similar vector embeddings are forwarded to the LLM model for further processing.

Model Workflow
Here are a few similarity measures –

  1. Cosine Similarity — Cosine similarity measures the cosines of the angle between 2 vectors in a vector space. It ranges from -1 to 1, where 1 represents identical vectors, 0 represents orthogonal vectors and -1 represents vectors that are diametrically opposed.

Cosine Similarity

2. Euclidean Distance — Euclidean distance measures the straight-line distance between 2 vectors in a vector space. It ranges from 0 to infinity, where 0 represents identical vectors and larger values represent increasingly dissimilar vectors.

Euclidean Distance

3. Jaccard Similarity — Jaccard similarity is used for measuring the similarity between vectors. It is determined by comparing their shared elements to their total elements.

Jaccard Similarity
Following are some of the vector databases –

  1. FAISS (Facebook AI Similarity Search) — Developed by Facebook AI, FAISS is a library designed to efficiently search and manage large collections of high-dimensional vectors, making it ideal for tasks such as image and text similarity search.
  2. Pinecone — Pinecone is a managed vector database service that offers real-time vector similarity search.
  3. Chroma — Chroma is a vector database that focuses on providing a flexible and scalable solution for storing and querying vector embeddings.

Fact—

A lot of venture capitalists are investing in various vector databases because they have realized that to build a successful LLM model, you need a robust vector database with very low latency that can easily perform numerous tasks for customers.

References —

  1. https://www.pinecone.io/learn/vector-database/

Finally —

Hopefully, you enjoyed reading it. This was an introduction to Vector Store. Buckle up, because our next blog is going to be EPIC!

Got questions? Don’t be shy! Hit me up on LinkedIn. Coffee’s on me (virtually, of course) ??

Please feel free to contact me: Medium, Linkedin.


Harpreet Kaur

Immediate joiner | SOC Analyst | #Open to work | Basics of Networking and Cyber Security

8 个月

Very informative

Adarsh Srivastav

SDE @Amazon | Data Structures and Algorithms | Java | AWS

8 个月

Thank for sharing such information

Heena Goyal

Site Reliability Engineer

8 个月

Good to know!

Anmol Singh Chaudhary

Ex-Intern at @CodSoft | Proficient in Java, React, Node.js, Python | AI/ML Enthusiast | Strong in Data Structures and Algorithms | Mechanical Engineering Student at NIT Kurukshetra | Tech Innovator

8 个月

Love this

Harpreet Kaur

Immediate joiner | SOC Analyst | #Open to work | Basics of Networking and Cyber Security

8 个月

Useful tips

要查看或添加评论,请登录

Ishika Garg的更多文章

  • SVD — Single Value Decomposition

    SVD — Single Value Decomposition

    Today, we embark on an exciting journey into the world of Singular Value Decomposition (SVD) — a fundamental concept in…

    8 条评论
  • Linear Regression

    Linear Regression

    Today, we’re diving into the math behind one of the most fundamental models in machine learning: linear regression…

    12 条评论
  • RAG

    RAG

    RAG stands for Retrieval-Augmented Generation. It’s a game-changer when working with LLMs.

    6 条评论
  • Transformers

    Transformers

    We’re exploring the realm of Deep Learning, focusing on the pivotal role that “transformers” play in driving…

    23 条评论
  • LLM Models

    LLM Models

    LLMs are a category of foundation models trained on large amounts of data (such as books, articles, etc.), enabling…

    14 条评论
  • Foundation Model

    Foundation Model

    FOUNDATION MODEL is a versatile machine learning model that has been pre-trained on a vast amount of unlabelled, and…

    6 条评论