Stop Using Vector Indexes (When You Don't Need Them)
Here's an article that might save you thousands of $ per day: Your vector search use case probably doesn't need that fancy ANN index.
The Default Solution Everyone Jumps To
Picture this: You're building an AI assistant that needs to search through personal user data - emails, documents, photos, you name it. Like everyone else, you immediately reach for a vector database with an ANN (Approximate Nearest Neighbor) index. It's what all the cool kids are doing, right?
This is one of the rare cases where not thinking through the problem might cause you to choose a solution that costs orders of magnitude more than the correct solution and where it doesn't even solve the problem.
The Secret of ANN Indexes
Let's explain why this "standard" approach fails on multiple levels.
First, let's talk about that "A" in ANN. It stands for "Approximate" - and that should terrify you. When searching through someone's emails, do you want to miss that crucial message? "Sorry, I couldn't find your tax documents because my approximate search didn't consider them." It's hard enough to build accurate embedding models; adding approximate search to the mix will undoubtedly cause accuracy degradation.
But it gets worse.
The Hidden Costs Are Brutal
The traditional ANN approach requires keeping most vectors in memory and building indexes over the vector data. For large-scale applications, this means:
领英推荐
Why We're All Doing It Wrong
The fundamental problem? We're building global ANN indexes when we don't need them. If your data naturally partitions - whether by user, organization, project, or any other dimension - and you don't need thousands of queries per second, you're over-engineering.
Think about it: When's the last time you needed to search through ALL your users' data at once? Never, right? Because that would be weird (and probably illegal).
The Solution You're Not Using (But Should Be)
Enter vector streaming search. Instead of building massive global ANN indexes, it:
This isn't just for personal assistants. Any system where your data has natural boundaries and moderate query rates can benefit. Think:
The Wake-Up
Sometimes, the simple solution isn't just cheaper—it's the right solution. Vector streaming search is the rare case where you get to have your cake and eat it, too: better results and lower cost.
So the next time someone tells you to "just throw an ANN index at it," ask yourself:
Sometimes, the most brilliant and cheapest solution is the simple one. And in this case, simple means streaming, not indexing.
Get started with Vespa streaming mode.
Maybe it is time for Vespa to implement DiskANN (needs less than 10Gb for 100M vectors). Compute cost of KNN quickly overshadows the memory cost of that index...
Senior Principal Solutions Architect @Vespa.ai
3 个月It's KNN brute force.
What is exactly streaming? Batched bruteforce search?
Founder at Searchplex. Building High-Performance AI-Powered Search Solutions Across Industries
3 个月Great article! Are there cases where streaming search might not be optimal, even on well-partitioned data?