Vector Databases - A List
Vector Databases are all the rage as useful for LLM/GenAI. Figured I’d publish a list that some of my friends have been building of the Vector Database options. Exceptional resource here as well. Great description by Andy Pavlo from OtterTune at post here that emphasizes how existing DBs are rapidly implementing capabilities that one would expect from a built for purpose vector database.
If you have adds/removes/changes — don’t hesitate to add in comments.
Faiss — (https://faiss.ai/#)
Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.Faiss is written in C++ with complete wrappers for Python. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research.
Milvus — (https://milvus.io/)
Milvus was created in 2019 with a singular goal: store, index, and manage massive embedding vectors generated by deep neural networks and other machine learning (ML) models.
As a database specifically designed to handle queries over input vectors, it is capable of indexing vectors on a trillion scale. Unlike existing relational databases which mainly deal with structured data following a pre-defined pattern, Milvus is designed from the bottom-up to handle embedding vectors converted from unstructured data.
As the Internet grew and evolved, unstructured data became more and more common, including emails, papers, IoT sensor data, Facebook photos, protein structures, and much more. In order for computers to understand and process unstructured data, these are converted into vectors using embedding techniques. Milvus stores and indexes these vectors. Milvus is able to analyze the correlation between two vectors by calculating their similarity distance. If the two embedding vectors are very similar, it means that the original data sources are similar as well.
Weaviate — (https://weaviate.io/)
Weaviate is an open-source vector database. It allows you to store data objects and vector embeddings from your favorite ML-models, and scale seamlessly into billions of data objects.
Chroma — (https://www.trychroma.com/)
Chroma is a database for building AI applications with embeddings. It comes with everything you need to get started built in, and runs on your machine. A hosted version is coming soon!
qdrant — (https://qdrant.tech/)
Qdrant is a vector database & vector similarity search engine. It deploys as an API service providing search for the nearest high-dimensional vectors. With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!
Vespa — (https://vespa.ai/)
领英推荐
Vespa is a fully featured search engine and vector database. It supports vector search (ANN), lexical search, and search in structured data, all in the same query. Integrated machine-learned model inference allows you to apply AI to make sense of your data in real time. Together with Vespa’s proven scaling and high availability, this empowers you to create production ready search applications at any scale, and with any combination of features.
Pgvector — (https://github.com/pgvector/pgvector)
An open-source extension for PostgreSQL that allows you to store and query vector embeddings within your database. pgvector is easy to use and can be installed with a single command.
opensearch — (https://opensearch.org/)
A community-driven, open source fork of Elasticsearch and Kibana following the license change in early 2021. It includes a vector database functionality that allows you to store and index vectors and metadata, and perform vector similarity search using k-NN indexes.
B) Proprietary products
Elasticsearch — (https://www.elastic.co/elasticsearch/)
A distributed search and analytics engine that supports various types of data. One of the data types that Elasticsearch supports is vector fields, which store dense vectors of numeric values. In version 7.10, Elasticsearch added support for indexing vectors into a specialized data structure to support fast kNN retrieval through the kNN search API. In version 8.0, Elasticsearch added support for native natural language processing (NLP) with vector fields.
Pinecone — (https://www.pinecone.io/)
Pinecone makes it easy to provide long-term memory for high-performance AI applications. It’s a managed, cloud-native vector database with a simple API and no infrastructure hassles. Pinecone serves fresh, filtered query results with low latency at the scale of billions of vectors.
Redis — (https://redis.io/)
Redis Enterprise manages vectors in an index data structure to enable intelligent similarity search that balances search speed and search quality. Choose from two popular techniques, FLAT (a brute force approach) and HNSW (a faster, and approximate approach), based on your data and use cases.
Singlestore — (https://www.singlestore.com/)
SingleStoreDB unifies transactions and analytics in a single engine to drive low-latencyaccess to large datasets, simplifying the development of fast, modern enterprise applications.Built for developers and architects, SingleStoreDB is based on a distributed SQL architecture, delivering 10–100 millisecond performance on complex queries — all while ensuring your business can effortlessly scale.
SinglestoreDB offers built-in vector database and also full text search capabilities.
Talent Specialist and Future Web Developer
6 个月Thanks for sharing this list, Andy! When choosing a vector database, scalability is key. As your data grows, the database needs to handle increasing volumes efficiently. Vector databases like Milvus and Pinecone are built with scalability in mind, using distributed systems and sharding to manage large datasets effectively. While performance and integration are also important, scalability ensures the database can grow with your needs over time. For more insights on these vector databases, check out this article by my colleague Jatin Malhotra: https://www.scalablepath.com/back-end/vector-databases
Passionate about partnerships. Social impact warrior. AI, ML, Cloud, Data, SaaS, etc. ex- VMWare, Pagerduty, HYCU, EA
11 个月Don't forget to backup your vector databases. HYCU offers integrations for Pinecone and Redis that can back it up in 1-Click. Request an integration for your favorite vector database. Www.hycu.com