登录查看更多内容

Vector Databases - A List

Andy Palmer

Entrepreneur and Seed Investor

发布日期: 2024年4月12日

Vector Databases are all the rage as useful for LLM/GenAI. Figured I’d publish a list that some of my friends have been building of the Vector Database options. Exceptional resource here as well. Great description by Andy Pavlo from OtterTune at post here that emphasizes how existing DBs are rapidly implementing capabilities that one would expect from a built for purpose vector database.

If you have adds/removes/changes — don’t hesitate to add in comments.

Faiss — (https://faiss.ai/#)
Milvus — (https://milvus.io/)
Weaviate — (https://weaviate.io/)
Chroma — (https://www.trychroma.com/)
qdrant — (https://qdrant.tech/)
Vespa — (https://vespa.ai/)
Pinecone — (https://www.pinecone.io/)
Elasticsearch — (https://www.elastic.co/elasticsearch/)
Pgvector — (https://github.com/pgvector/pgvector)
opensearch — (https://opensearch.org/)
Redis — (https://redis.io/)
Singlestore — (https://www.singlestore.com/)
JVector — embedded — (https://github.com/jbellis/jvector)
Astra — DBaaS- (https://www.datastax.com/products/datastax-astra)

Faiss — (https://faiss.ai/#)

Faiss is fully open source.

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.Faiss is written in C++ with complete wrappers for Python. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research.

Milvus — (https://milvus.io/)

Open Source and Managed version
Deploy on EKS — https://milvus.io/docs/eks.md
Deploy on EC2 — https://milvus.io/docs/aws.md

Milvus was created in 2019 with a singular goal: store, index, and manage massive embedding vectors generated by deep neural networks and other machine learning (ML) models.

As a database specifically designed to handle queries over input vectors, it is capable of indexing vectors on a trillion scale. Unlike existing relational databases which mainly deal with structured data following a pre-defined pattern, Milvus is designed from the bottom-up to handle embedding vectors converted from unstructured data.

As the Internet grew and evolved, unstructured data became more and more common, including emails, papers, IoT sensor data, Facebook photos, protein structures, and much more. In order for computers to understand and process unstructured data, these are converted into vectors using embedding techniques. Milvus stores and indexes these vectors. Milvus is able to analyze the correlation between two vectors by calculating their similarity distance. If the two embedding vectors are very similar, it means that the original data sources are similar as well.

Weaviate — (https://weaviate.io/)

Open Source and Managed version

Weaviate is an open-source vector database. It allows you to store data objects and vector embeddings from your favorite ML-models, and scale seamlessly into billions of data objects.

Chroma — (https://www.trychroma.com/)

Open Source

Chroma is a database for building AI applications with embeddings. It comes with everything you need to get started built in, and runs on your machine. A hosted version is coming soon!

qdrant — (https://qdrant.tech/)

Open Source and Managed version

Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!

Vespa — (https://vespa.ai/)

Open Source and Managed version

领英推荐

A Taxonomy of the AI Database Ecosystem

Vincent Granville 7 个月前

Distributed Bloom Filter

Patrick Nicolas 8 个月前

AI and All Data Weekly for 09 Dec 2024

Tim Spann 3 个月前

Vespa is a fully featured search engine and vector database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query. Integrated machine-learned model inference allows you to apply AI to make sense of your data in real time. Together with Vespa’s proven scaling and high availability, this empowers you to create production ready search applications at any scale, and with any combination of features.

Pgvector — (https://github.com/pgvector/pgvector)

Open Source

An open-source extension for PostgreSQL that allows you to store and query vector embeddings within your database. pgvector is easy to use and can be installed with a single command.

opensearch — (https://opensearch.org/)

Open Source

A community-driven, open source fork of Elasticsearch and Kibana following the license change in early 2021. It includes a vector database functionality that allows you to store and index vectors and metadata, and perform vector similarity search using k-NN indexes.

B) Proprietary products

Elasticsearch — (https://www.elastic.co/elasticsearch/)

Paid/Licensed

A distributed search and analytics engine that supports various types of data. One of the data types that Elasticsearch supports is vector fields, which store dense vectors of numeric values. In version 7.10, Elasticsearch added support for indexing vectors into a specialized data structure to support fast kNN retrieval through the kNN search API. In version 8.0, Elasticsearch added support for native natural language processing (NLP) with vector fields.

Pinecone — (https://www.pinecone.io/)

Paid/Licensed

Pinecone makes it easy to provide long-term memory for high-performance AI applications. It’s a managed, cloud-native vector database with a simple API and no infrastructure hassles. Pinecone serves fresh, filtered query results with low latency at the scale of billions of vectors.

Redis — (https://redis.io/)

Paid/Licensed

Redis Enterprise manages vectors in an index data structure to enable intelligent similarity search that balances search speed and search quality. Choose from two popular techniques, FLAT (a brute force approach) and HNSW (a faster, and approximate approach), based on your data and use cases.

Singlestore — (https://www.singlestore.com/)

Paid/Licensed

SingleStoreDB unifies transactions and analytics in a single engine to drive low-latencyaccess to large datasets, simplifying the development of fast, modern enterprise applications.Built for developers and architects, SingleStoreDB is based on a distributed SQL architecture, delivering 10–100 millisecond performance on complex queries — all while ensuring your business can effortlessly scale.

SinglestoreDB offers built-in vector database and also full text search capabilities.

Colin Mahony Christopher Ahlberg

Kevin Ortiz (He/Him)

Talent Specialist and Future Web Developer

6 个月

Thanks for sharing this list, Andy! When choosing a vector database, scalability is key. As your data grows, the database needs to handle increasing volumes efficiently. Vector databases like Milvus and Pinecone are built with scalability in mind, using distributed systems and sharding to manage large datasets effectively. While performance and integration are also important, scalability ensures the database can grow with your needs over time. For more insights on these vector databases, check out this article by my colleague Jatin Malhotra: https://www.scalablepath.com/back-end/vector-databases

May Tong

Passionate about partnerships. Social impact warrior. AI, ML, Cloud, Data, SaaS, etc. ex- VMWare, Pagerduty, HYCU, EA

11 个月

Don't forget to backup your vector databases. HYCU offers integrations for Pinecone and Redis that can back it up in 1-Click. Request an integration for your favorite vector database. Www.hycu.com

查看更多评论

要查看或添加评论，请登录

Andy Palmer的更多文章

Many Small Stock Grants Over Time Should Be the default in Start-Ups

2025年3月21日

Many Small Stock Grants Over Time Should Be the default in Start-Ups

In a post back in 2012, I wrote about how important it is for founders to take a proactive and decisive approach to…

5 条评论
Steve Jobs on Consulting - don't stand on the sidelines (I STRONGLY agree)

2025年1月4日

Steve Jobs on Consulting - don't stand on the sidelines (I STRONGLY agree)

This clip of Steve Jobs giving a talk at MIT in 1992 has always felt like GREAT advice. I spend a lot of time talking…

6 条评论
VotingWorks - modern, nonpartisan, nonprofit, opensource voting infrastructure we can all trust.

2024年12月17日

VotingWorks - modern, nonpartisan, nonprofit, opensource voting infrastructure we can all trust.

“The vote is precious. It is almost sacred.

3 条评论
Why AI Coding Assistants Miss the Mark – And How AI Software Architects use Context to Unlock AI Coding Copilots’ True Potential

2024年11月22日

Why AI Coding Assistants Miss the Mark – And How AI Software Architects use Context to Unlock AI Coding Copilots’ True Potential

It's 2024, and AI has begun to reshape software engineering. Tools like GitHub Copilot have transformed developer…
Living for a Living - having the best of both a great career and an even better life

2024年6月21日

Living for a Living - having the best of both a great career and an even better life

Paula Caligiuri, PhD and I wrote "Live for a Living" because we wanted everyone to experience more fulfillment from…

1 条评论
'24 Spring Update @ Koa Labs

2024年4月30日

'24 Spring Update @ Koa Labs

As we approach summer of 2024 we are moving into a new phase at Koa Labs. Over the past 2+ years we’ve shifted our…

1 条评论
Mastering Data Variety at Enterprise Scale

2024年4月23日

Mastering Data Variety at Enterprise Scale

This post was written in conjunction with Nikolaus Bates-Haus and Matthew Holzapfel of Tamr as well as Mark Marinelli…

2 条评论
Fantastic new startup lawyer in Cambridge/Boston — Elke Trilla

2024年4月22日

Fantastic new startup lawyer in Cambridge/Boston — Elke Trilla

Over the years I’ve had a chance to meet many lawyers who claim to work with startups. The number that are actually…
BOD Expectations and Compensation @ Startups

2024年4月22日

BOD Expectations and Compensation @ Startups

I get lots of questions about compensation for BOD members @ startups. I can’t say it much better than Brad Feld in…

1 条评论
A note about first investment meetings for founders

2024年4月18日

A note about first investment meetings for founders

At Koa we have taken a lot of pitch meetings with founders over the past decades. There are tons of great resources on…

3 条评论

See all articles

Vector Databases - A List

Andy Palmer

Entrepreneur and Seed Investor

领英推荐

Andy Palmer的更多文章

社区洞察

其他会员也浏览了

Timescale Newsletter ?? Pushing Postgres Boundaries

AI and All Data for 16 December 2024

Neo4j Graph Tech Weekly

Schedule for 2023

Data Scientist Journey with the 100 Days of Code Challenge - Part 1

"Spark Performance Tuning with help of Spark UI"

NiFi and Retrieval Augmented Generation

Algorithms for Work: Data Type - Small Things Matter.

DataFrames Battle Royale | Pandas vs Polars vs Spark

End to End Pyspark Example

领英推荐

Andy Palmer的更多文章

Many Small Stock Grants Over Time Should Be the default in Start-Ups

Steve Jobs on Consulting - don't stand on the sidelines (I STRONGLY agree)

VotingWorks - modern, nonpartisan, nonprofit, opensource voting infrastructure we can all trust.

Why AI Coding Assistants Miss the Mark – And How AI Software Architects use Context to Unlock AI Coding Copilots’ True Potential

Living for a Living - having the best of both a great career and an even better life

'24 Spring Update @ Koa Labs

Mastering Data Variety at Enterprise Scale

Fantastic new startup lawyer in Cambridge/Boston — Elke Trilla

BOD Expectations and Compensation @ Startups

A note about first investment meetings for founders

社区洞察

其他会员也浏览了

Timescale Newsletter ?? Pushing Postgres Boundaries

AI and All Data for 16 December 2024

Neo4j Graph Tech Weekly

Schedule for 2023

Data Scientist Journey with the 100 Days of Code Challenge - Part 1

"Spark Performance Tuning with help of Spark UI"

NiFi and Retrieval Augmented Generation

Algorithms for Work: Data Type - Small Things Matter.

DataFrames Battle Royale | Pandas vs Polars vs Spark

End to End Pyspark Example