登录查看更多内容

?? The Rising Star of ML Ops: VectorDB - Why They're Outperforming SQL & NoSQL for Embedding Storage

Abhi Mahule

Tech leader with expertise in building high performance engineering teams at fast growth startups. 2x founder | 1 IPO | Ex-Roku | Ex-Capital One | Holder of O-1A extraordinary ability visa

发布日期: 2023年7月19日

Why VectorDB?

As part of our journey at Vyrill, we're always learning and exploring new things because of our AI-driven focus. One of the exciting things we've come across is VectorDBs. This interesting technology popped up as we were working on a task, and I thought it would be great to share what we've learned with all of you.

Our goal was to better manage ML model embeddings. We wanted to integrate these with the search results from our dataset. But we found out that typical databases like SQL or NoSQL just weren't the right fit for storing these numerical matrix representations. ????

?? Understanding Embeddings

Before diving into VectorDBs, let's demystify embeddings in ML:

Embeddings in machine learning are like a special type of dictionary that help a computer turn complex data, like words or categories, into numbers it can understand. Embeddings allow a computer to grasp the relationships or similarities between data elements.

?? Word Embeddings: Words are converted into numbers, allowing machines to understand the similarity between words like 'cat' and 'kitten'.

?? Entity Embeddings: Categories are translated into numbers, enabling differentiation of types like movies or foods.

?? Graph Embeddings: Relationships within a network are quantified so a computer can understand social network mappings.

??? Image Embeddings: Images are converted into numbers, enabling machines to perceive the similarity between two images.

Embeddings, particularly word embeddings, play a huge role in applications like Langchain and ChatGPT. They help these AI models understand language by turning words into numbers.

? SQL & NoSQL: Why Not?

Why weren't SQL or NoSQL databases suitable for storing embeddings? Although SQL databases excel with structured data and NoSQL with unstructured data, neither can handle the unique characteristics and volumes that come with embeddings. SQL and NoSQL databases are not designed to perform real-time computations and handle high-dimensional vector data, typical in AI applications. They lack the necessary speed and efficiency to calculate vector similarities on the fly and scale to handle voluminous vector datasets.

?? Enter the Game Changer: VectorDB

VectorDBs are emerging stars in the ML Ops universe, specifically designed to store and query vector data like AI embeddings. They accommodate vast vector data volumes and allow fast approximate vector searches, optimizing the storage and retrieval of vector data.

VectorDBs are harnessed for various use cases:

- ?? Semantic search: finding similar meaning documents

- ??? Product recommendations: identifying similar users/items

- ?? Anomaly detection: pinpointing outliers in data

- ??? Document categorization: classifying documents by topic

- ?? Pattern recognition: matching inputs to trained examples

- ?? Forecasting: predicting future data points based on vectors

领英推荐

Issue #316 - The ML Engineer ??

Alejandro Saucedo 2 个月前

The March 2024 MinIO Newsletter

MinIO 1 年前

No SQL? No Problem! How to Query Your Data Assets with…

Adam Morton 11 个月前

?? Peering Under the Hood of VectorDB: A Simplified & Technical Guide

Let's envision VectorDBs as large libraries ?? where books symbolize your data. Librarians (database algorithms) break down books into smaller chapters (subvectors), encoding them compactly ?? while retaining their essence.

When a reader (a query) ?? seeks a chapter, the librarians use an efficient cataloging system (indexing) ??? for quick access, sometimes even employing electronic sorting (GPU optimizations) ??.

In technical lingo, VectorDBs index and query vector data efficiently ??. Vectors are encoded using methods like product quantization. The vectors are split into small subvectors, each assigned to a cluster ??.

These vectors are indexed using advanced data structures, enabling speedy location ?? of similar vectors for a query. Some VectorDBs optimize index building for GPUs to hasten searches ?.

By amalgamating intelligent data encoding, advanced indexing, and computation optimizations, VectorDBs facilitate rapid searches, even amid sizable vector datasets. ????

No alt text provided for this image — System architecture using Pinecone, a popular VectorDB

?? In summary

?? What are VectorDBs?

Databases for storing and querying vector data like AI embeddings ??
Allow fast approximate vector searches ??
Optimize storage and retrieval of vector data

?? Use cases:

?? Semantic search - find similar meaning docs ??
?? Product recommendations - similar users/items ?
?? Anomaly detection - identify outliers in data ??
??? Document categorization - classify docs by topic ???
?? Pattern recognition - match inputs to trained examples ???
?? Forecasting - predict future points based on vectors ??

?? How they differ from SQL & NoSQL:

?? Built specifically for vector data
? Calculate vector similarities on the fly
?? Scale to handle large vector datasets
?? Blazing fast response times

--------------------------------------------------------------------

Follow me Abhi Mahule for more enlightening posts on AI and startup culture. Stay tuned! ?? ??

--------------------------------------------------------------------

#AI #ML #VectorDB #llm #database

要查看或添加评论，请登录

Abhi Mahule的更多文章

Ask Vyrill - UGC Video Intelligence copilot

2024年1月4日

Ask Vyrill - UGC Video Intelligence copilot

We recently released our version of ChatGPT called "Ask Vyrill," a RAG (Retrieval Augmented Generation) based approach…

2 条评论
Migrating to MongoDB Atlas

2023年12月13日

Migrating to MongoDB Atlas

We recently completed a major migration journey of our database infrastructure at Vyrill. The goal was to move from a…

1 条评论
MongoDB 6.0 Migration on EC2: The Good, the Bad, and the Gotchas

2023年10月17日

MongoDB 6.0 Migration on EC2: The Good, the Bad, and the Gotchas

At Vyrill, we recently upgraded our database infrastructure by migrating to MongoDB 6.0 running on EC2.
Harnessing the Power of GraphQL

2023年8月2日

Harnessing the Power of GraphQL

Introduction ?? As the CTO of a startup, my role involves continuously exploring ways to optimize our tech stack. In…
?? Chains of Thought: Building Smarter AI with LangChain

2023年7月25日

?? Chains of Thought: Building Smarter AI with LangChain

Langchain - What is it and how to use it? The AI world is buzzing about LangChain, the new toolkit for working with…
Generative AI, you say?

2023年2月8日

Generative AI, you say?

While the hype in the AI landscape has been increasing steadily over the years, it has reached a crescendo, thanks to…

4 条评论
Biggest crypto scam of all time?

2022年5月15日

Biggest crypto scam of all time?

Intro Decentralized Finance (DeFi) is a fascinating space and has a promising future with an enormous number of…
What is NoSQL, and why may you care?

2022年5月9日

What is NoSQL, and why may you care?

The term NoSQL was created to stand out in contrast to another term, "SQL". Let's take a look at what it means and its…
What is the blockchain trilemma?

2022年5月7日

What is the blockchain trilemma?

Trilemma (noun) - a situation in which a difficult choice has to be made between three alternatives, especially when…
#1 reason VCs are bullish about crypto

2022年4月27日

#1 reason VCs are bullish about crypto

Every few years, there are phases that entrepreneurs and VCs get tremendously excited about. Crypto is the area that…

1 条评论

See all articles

?? The Rising Star of ML Ops: VectorDB - Why They're Outperforming SQL & NoSQL for Embedding Storage

Abhi Mahule

Tech leader with expertise in building high performance engineering teams at fast growth startups. 2x founder | 1 IPO | Ex-Roku | Ex-Capital One | Holder of O-1A extraordinary ability visa

Why VectorDB?

?? Understanding Embeddings

? SQL & NoSQL: Why Not?

?? Enter the Game Changer: VectorDB

领英推荐

?? Peering Under the Hood of VectorDB: A Simplified & Technical Guide

?? In summary

Abhi Mahule的更多文章

社区洞察

其他会员也浏览了

Chat with your Data in the Database without writing SQL

Choosing a Vector Database for Your Gen AI Stack

DATA Pill #073 - Building ETL pipelines with Generative AI, Elementary for dbt

Data Science Demystified: Turning Raw Data into Strategic Insights

DATA Pill #095 - Real-Time RAG, pick between Kimball, One Big Table, and Relational Modeling

The Difference Between Data Analytics and Data Science is Often Seen as One of Timescale

What are data science, big data, and machine learning?

Data Science Trends: How to Stay Ahead of the Curve

Essential Skills for a Data Scientist in 2025: A Comprehensive Guide

Dark Secrets of Data Science Which You Should Know

Why VectorDB?

?? Understanding Embeddings

? SQL & NoSQL: Why Not?

?? Enter the Game Changer: VectorDB

领英推荐

?? Peering Under the Hood of VectorDB: A Simplified & Technical Guide

?? In summary

Abhi Mahule的更多文章

Ask Vyrill - UGC Video Intelligence copilot

Migrating to MongoDB Atlas

MongoDB 6.0 Migration on EC2: The Good, the Bad, and the Gotchas

Harnessing the Power of GraphQL

?? Chains of Thought: Building Smarter AI with LangChain

Generative AI, you say?

Biggest crypto scam of all time?

What is NoSQL, and why may you care?

What is the blockchain trilemma?

#1 reason VCs are bullish about crypto

社区洞察

其他会员也浏览了

Chat with your Data in the Database without writing SQL

Choosing a Vector Database for Your Gen AI Stack

DATA Pill #073 - Building ETL pipelines with Generative AI, Elementary for dbt

Data Science Demystified: Turning Raw Data into Strategic Insights

DATA Pill #095 - Real-Time RAG, pick between Kimball, One Big Table, and Relational Modeling

The Difference Between Data Analytics and Data Science is Often Seen as One of Timescale

What are data science, big data, and machine learning?

Data Science Trends: How to Stay Ahead of the Curve

Essential Skills for a Data Scientist in 2025: A Comprehensive Guide

Dark Secrets of Data Science Which You Should Know