登录查看更多内容

Stop Using Vector Indexes (When You Don't Need Them)

Jo Kristian Bergum

Retrieval Evangelist

发布日期: 2024年11月11日

Here's an article that might save you thousands of $ per day: Your vector search use case probably doesn't need that fancy ANN index.

The Default Solution Everyone Jumps To

Picture this: You're building an AI assistant that needs to search through personal user data - emails, documents, photos, you name it. Like everyone else, you immediately reach for a vector database with an ANN (Approximate Nearest Neighbor) index. It's what all the cool kids are doing, right?

This is one of the rare cases where not thinking through the problem might cause you to choose a solution that costs orders of magnitude more than the correct solution and where it doesn't even solve the problem.

The Secret of ANN Indexes

Let's explain why this "standard" approach fails on multiple levels.

First, let's talk about that "A" in ANN. It stands for "Approximate" - and that should terrify you. When searching through someone's emails, do you want to miss that crucial message? "Sorry, I couldn't find your tax documents because my approximate search didn't consider them." It's hard enough to build accurate embedding models; adding approximate search to the mix will undoubtedly cause accuracy degradation.

But it gets worse.

The Hidden Costs Are Brutal

The traditional ANN approach requires keeping most vectors in memory and building indexes over the vector data. For large-scale applications, this means:

Massive memory requirements
Orders of magnitude higher infrastructure costs
Write speeds that crawl along in megabytes per second (indexing) instead of GB/s
And you're still missing key results!

领英推荐

Signals: Sustained CTR Growth 18 Months On

Times Internet 6 个月前

Unveiling the Art of Time Series Analysis: Choosing…

360DigiTMG 5 个月前

Data Dilemma: Is Your Data Holding You Back?

Kaara 1 个月前

Why We're All Doing It Wrong

The fundamental problem? We're building global ANN indexes when we don't need them. If your data naturally partitions - whether by user, organization, project, or any other dimension - and you don't need thousands of queries per second, you're over-engineering.

Think about it: When's the last time you needed to search through ALL your users' data at once? Never, right? Because that would be weird (and probably illegal).

The Solution You're Not Using (But Should Be)

Enter vector streaming search. Instead of building massive global ANN indexes, it:

Co-locates naturally grouped data
Streams from disk instead of hoarding memory
Finds ALL matches, no errors introduced by approximate search
Writes data orders of magnitude faster
Scales naturally with your data partitions
Combinations of filters are not an issue; they can be combined with full-text search and even substring matching.

This isn't just for personal assistants. Any system where your data has natural boundaries and moderate query rates can benefit. Think:

Per-organization document search
Project-specific code search
Department-level knowledge bases

The Wake-Up

Sometimes, the simple solution isn't just cheaper—it's the right solution. Vector streaming search is the rare case where you get to have your cake and eat it, too: better results and lower cost.

So the next time someone tells you to "just throw an ANN index at it," ask yourself:

Does my data have natural partitions? Most likely.
Do I need thousands of queries per second per partition? Most likely not.
Can I afford to miss crucial matches? Most likely not.

Sometimes, the most brilliant and cheapest solution is the simple one. And in this case, simple means streaming, not indexing.

Get started with Vespa streaming mode.

Knut Risvik

3 个月

Maybe it is time for Vespa to implement DiskANN (needs less than 10Gb for 100M vectors). Compute cost of KNN quickly overshadows the memory cost of that index...

3 次回应

Piotr Kobziakowski

Senior Principal Solutions Architect @Vespa.ai

3 个月

It's KNN brute force.

1 次回应

Leo (Leonid) Boytsov

3 个月

What is exactly streaming? Batched bruteforce search?

1 次回应

Ravindra Harige

Founder at Searchplex. Building High-Performance AI-Powered Search Solutions Across Industries

3 个月

Great article! Are there cases where streaming search might not be optimal, even on well-partitioned data?

查看更多评论

要查看或添加评论，请登录

Jo Kristian Bergum的更多文章

From ML Teams to API Calls: The Illusion of Simplicity

2025年2月6日

From ML Teams to API Calls: The Illusion of Simplicity

What once required dedicated machine learning teams, months or years of data collection, and complex training pipelines…

4 条评论
Why AI Agents Are Forcing Enterprises to Rethink Retrieval Investments

2025年1月27日

Why AI Agents Are Forcing Enterprises to Rethink Retrieval Investments

Enterprise search tools lingered in the background for decades—a minor employee efficiency booster with a low…

2 条评论
The Anatomy of Large-Scale Recommender Systems

2025年1月20日

The Anatomy of Large-Scale Recommender Systems

Modern real-time recommender systems power many of today's most engaging platforms. While TikTok's implementation…

1 条评论
Why AI Giants Are Suddenly Obsessed With Enterprise Search

2025年1月13日

Why AI Giants Are Suddenly Obsessed With Enterprise Search

The AI giants have a critical weakness. Their frontier models, trained on vast internet data, fail in enterprise…

6 条评论
2024: The rise and fall of the vector database infrastructure category

2025年1月3日

2024: The rise and fall of the vector database infrastructure category

I've spent the last few years watching embedding technologies transform from Big Tech's "secret sauce" into everyday…

14 条评论
A Practical Guide to Benchmarking Search Systems

2024年11月8日

A Practical Guide to Benchmarking Search Systems

In my early career days, I overheard a senior engineer saying that "we should deploy these systems well below the knee…

2 条评论
Shrink Your Embeddings: Slashing Costs with MRL and BQL

2024年10月18日

Shrink Your Embeddings: Slashing Costs with MRL and BQL

Let's face it: vector embeddings are fantastic for many tasks, but if you've ever worked with large-scale vector…

2 条评论
Why separating compute from storage is a bad idea for late interaction models like ColPali

2024年10月18日

Why separating compute from storage is a bad idea for late interaction models like ColPali

While late-interaction models offer compelling benefits, naive implementations can lead to severe performance…

2 条评论

See all articles

Stop Using Vector Indexes (When You Don't Need Them)

Jo Kristian Bergum

Retrieval Evangelist

The Default Solution Everyone Jumps To

The Secret of ANN Indexes

The Hidden Costs Are Brutal

领英推荐

Why We're All Doing It Wrong

The Solution You're Not Using (But Should Be)

The Wake-Up

Jo Kristian Bergum的更多文章

社区洞察

其他会员也浏览了

Data Update 1 for 2025: The Draw (and Dangers) of Data

Data Cleaning - Sort Values

Datasets/ Data Sources and where to find them, ????.

2024.2 #DataOnTheRocks

Leaders Are Readers -- November 2024

What Is The Semantic Layer And Why Does It Matter?

Industry Keynote

Approximate Search

Few-Shot vs. Fine-Tuning: Detecting Contract Shaping in Federal Contracting

#135 Lake Vectoria: Navigating the Waters of Data and Vectors

The Default Solution Everyone Jumps To

The Secret of ANN Indexes

The Hidden Costs Are Brutal

领英推荐

Why We're All Doing It Wrong

The Solution You're Not Using (But Should Be)

The Wake-Up

Jo Kristian Bergum的更多文章

From ML Teams to API Calls: The Illusion of Simplicity

Why AI Agents Are Forcing Enterprises to Rethink Retrieval Investments

The Anatomy of Large-Scale Recommender Systems

Why AI Giants Are Suddenly Obsessed With Enterprise Search

2024: The rise and fall of the vector database infrastructure category

A Practical Guide to Benchmarking Search Systems

Shrink Your Embeddings: Slashing Costs with MRL and BQL

Why separating compute from storage is a bad idea for late interaction models like ColPali

社区洞察

其他会员也浏览了

Data Update 1 for 2025: The Draw (and Dangers) of Data

Data Cleaning - Sort Values

Datasets/ Data Sources and where to find them, ????.

2024.2 #DataOnTheRocks

Leaders Are Readers -- November 2024

What Is The Semantic Layer And Why Does It Matter?

Industry Keynote

Approximate Search

Few-Shot vs. Fine-Tuning: Detecting Contract Shaping in Federal Contracting

#135 Lake Vectoria: Navigating the Waters of Data and Vectors