Stop Using Vector Indexes (When You Don't Need Them)

Stop Using Vector Indexes (When You Don't Need Them)

Here's an article that might save you thousands of $ per day: Your vector search use case probably doesn't need that fancy ANN index.


The Default Solution Everyone Jumps To

Picture this: You're building an AI assistant that needs to search through personal user data - emails, documents, photos, you name it. Like everyone else, you immediately reach for a vector database with an ANN (Approximate Nearest Neighbor) index. It's what all the cool kids are doing, right?

This is one of the rare cases where not thinking through the problem might cause you to choose a solution that costs orders of magnitude more than the correct solution and where it doesn't even solve the problem.

The Secret of ANN Indexes

Let's explain why this "standard" approach fails on multiple levels.

First, let's talk about that "A" in ANN. It stands for "Approximate" - and that should terrify you. When searching through someone's emails, do you want to miss that crucial message? "Sorry, I couldn't find your tax documents because my approximate search didn't consider them." It's hard enough to build accurate embedding models; adding approximate search to the mix will undoubtedly cause accuracy degradation.

But it gets worse.

The Hidden Costs Are Brutal

The traditional ANN approach requires keeping most vectors in memory and building indexes over the vector data. For large-scale applications, this means:

  • Massive memory requirements
  • Orders of magnitude higher infrastructure costs
  • Write speeds that crawl along in megabytes per second (indexing) instead of GB/s
  • And you're still missing key results!

Why We're All Doing It Wrong

The fundamental problem? We're building global ANN indexes when we don't need them. If your data naturally partitions - whether by user, organization, project, or any other dimension - and you don't need thousands of queries per second, you're over-engineering.

Think about it: When's the last time you needed to search through ALL your users' data at once? Never, right? Because that would be weird (and probably illegal).

The Solution You're Not Using (But Should Be)

Enter vector streaming search. Instead of building massive global ANN indexes, it:

  • Co-locates naturally grouped data
  • Streams from disk instead of hoarding memory
  • Finds ALL matches, no errors introduced by approximate search
  • Writes data orders of magnitude faster
  • Scales naturally with your data partitions
  • Combinations of filters are not an issue; they can be combined with full-text search and even substring matching.

This isn't just for personal assistants. Any system where your data has natural boundaries and moderate query rates can benefit. Think:

  • Per-organization document search
  • Project-specific code search
  • Department-level knowledge bases

The Wake-Up

Sometimes, the simple solution isn't just cheaper—it's the right solution. Vector streaming search is the rare case where you get to have your cake and eat it, too: better results and lower cost.

So the next time someone tells you to "just throw an ANN index at it," ask yourself:

  1. Does my data have natural partitions? Most likely.
  2. Do I need thousands of queries per second per partition? Most likely not.
  3. Can I afford to miss crucial matches? Most likely not.

Sometimes, the most brilliant and cheapest solution is the simple one. And in this case, simple means streaming, not indexing.

Get started with Vespa streaming mode.

Maybe it is time for Vespa to implement DiskANN (needs less than 10Gb for 100M vectors). Compute cost of KNN quickly overshadows the memory cost of that index...

Piotr Kobziakowski

Senior Principal Solutions Architect @Vespa.ai

3 个月

It's KNN brute force.

What is exactly streaming? Batched bruteforce search?

Ravindra Harige

Founder at Searchplex. Building High-Performance AI-Powered Search Solutions Across Industries

3 个月

Great article! Are there cases where streaming search might not be optimal, even on well-partitioned data?

回复

要查看或添加评论,请登录

Jo Kristian Bergum的更多文章

社区洞察

其他会员也浏览了