登录查看更多内容

Using Retrievability to Measure Recall

Daniel Tunkelang

Query Understanding

发布日期: 2023年12月7日

In court, witnesses swear to tell "the whole truth and nothing but the truth." Search engines are not under oath, but they should be truthful.

Two metrics for search relevance are precision and recall . Precision means telling nothing but the truth, while recall means telling the whole truth.

Precision is the fraction of retrieved results that are relevant. Recall is the fraction of relevant documents that were retrieved. There is a tradeoff: efforts to improve one metric often come at the expense of the other.

Measuring recall is harder than measuring precision.

Unfortunately, while precision is relatively straightforward to measure, recall is another story — since we rarely know how many relevant results are in the index. As a result, people often estimate recall using crude proxies, such as the fraction of queries that return no or few results.

We can and should do better. Recall might not seem as important as precision for many search applications, but it is still a key metric. After all, if a result is not retrievable, it might as not well even be in the search index.

To measure recall, we can measure retrievability.

The reason we care about recall is to ensure the retrievability of results, so perhaps we can measure the retrievability of results more directly.

Consider an entry in the search index. We can measure its retrievability by executing a set of search queries that should retrieve the entry and then counting how many of those queries actually retrieve it. For example, a black t-shirt should be retrievable by queries like "black tshirt", "black tshirts", "black t shirt", "tshirts black", etc.

This strategy isn’t as simple as sounds. For a large search index, measuring the retrievability of every entry is prohibitively expensive. We can address this concern by taking a representative sample. The bigger challenge is obtaining a set of search queries that we expect to retrieve a given entry.

领英推荐

What is concept search?

Algolia 1 年前

Why a scientific search engine is critical for success

美国化学文摘社 3 个月前

Vector Search: The Next Generation of Intelligent…

Sanjay Kumar MBA,MS,PhD 1 个月前

Reverse search: going from a potential result to candidate queries.

We could ask people to manually come up with a set of search queries for a given entry in the index. But this process would be expensive and difficult. Coming up with such queries is not something humans are good at, though the idea has been explored as an application of human computation .

A more practical approach is to automate query generation. There are a variety of ways to generate queries from index entries, such as doc2query . But it’s a good idea to generate queries that searchers are likely to make. To do so, we treat query generation as search problem, indexing our query log and then retrieving the most relevant queries for a result from that log.

Not all candidate queries are equal.

When we measure retrievability this way, we should also take into account the frequency of the queries we generate. Weighing queries by frequency allows us to measure retrievability in a searcher-centric way. For example, there are probably more people who search for "black tshirts" than "tshirts that are black in color".

But we have to be careful. If our queries drift too far from the source entry, then we would not even want those queries to include the entry in their results. Also, if the queries are not sufficiently specific, their inclusion of the entry in a large result set is not all that useful, regardless of query frequency. Continuing our example, it is more useful for our black t-shirt to appear in results for "black tshirts" than in results for "shirts" or "clothing".

Hence, we want to focus on specific queries for which the result is relevant, and then weigh those queries according to their frequency. This is still a difficult and underspecified solution, but hopefully a useful framework.

We can’t give up on measuring recall just because it’s hard.

Measuring recall has always been difficult, so it is understandable that search application developers — especially folks in industry who have to ruthlessly prioritize resources — have tended to focus on precision.

But recall matters. Ranking cannot make up for lost recall. If retrieval fails to include a relevant result, ranking cannot make it magically appear. So we need to invest in recall, and that means we have to have a way to measure it. Hopefully this proposed approach of measuring retrievability helps give recall the respect it deserves.

Pradeep Das

11 个月

awesome; thanks for sharing

1 次回应

要查看或添加评论，请登录

Daniel Tunkelang的更多文章

Quo Vadis Nunc, Quora?

2024年9月25日

Quo Vadis Nunc, Quora?

I was one of Quora’s earliest users, earned Top Writer status for a few years, and topped the leaderboard as a 9-time…

2 条评论
Seriously or Literally?

2024年9月18日

Seriously or Literally?

The other day, I posted about the need for search applications to take searchers seriously, not literally. This need…
Cold Start, Practical Edition

2024年9月16日

Cold Start, Practical Edition

If you are a search application developer or some other kind of machine learning practitioner, you have probably…
All Else Equal

2024年9月10日

All Else Equal

In The Three-Body Problem, Liu Cixin describes how an alien species drives scientists to suicide by making it…

8 条评论
Take Searchers Seriously, Not Literally

2024年9月4日

Take Searchers Seriously, Not Literally

Search application developers manage numerous tradeoffs, foremost the tradeoff between precision and recall. Precision…
Hallucinating a Post-Search World

2024年8月30日

Hallucinating a Post-Search World

When I first heard about 3D printing, I imagined something like a Star Trek replicator that could synthesize arbitrary…
Handling Facets With Many Values

2024年8月21日

Handling Facets With Many Values

The previous post addresses the challenge of selecting which facets a search application should present to searchers as…
Facets, But Which Ones?

2024年8月15日

Facets, But Which Ones?

This post dives into a particular challenge of faceted search, exploring the challenge of selecting which facets a…
Search and Discovery

2024年8月13日

Search and Discovery

If search has one job, it is to help searchers find what they are looking for. However, many search application…
Where Do LTR Labels Come From?

2024年8月6日

Where Do LTR Labels Come From?

The most common goal that my search clients express is a desire to improve their ranking. I always start by managing…

See all articles

Using Retrievability to Measure Recall

Daniel Tunkelang

Query Understanding

Measuring recall is harder than measuring precision.

To measure recall, we can measure retrievability.

领英推荐

Reverse search: going from a potential result to candidate queries.

Not all candidate queries are equal.

We can’t give up on measuring recall just because it’s hard.

Daniel Tunkelang的更多文章

社区洞察

其他会员也浏览了

Making Sense of Null and Low Results

Is Similarity Objective?

Tuning Information Retrieval in Agent Builder Search applications with Google Search?Adaptor.

How search engines work, the role of algorithms?

How to Get Ranked on You.com?

Throwing Needles Into Haystacks

Effective Query Triage

Search: The Whole Story

Video indexing issues found on your site [SOLVED]

Balance Your Search Budget!

Measuring recall is harder than measuring precision.

To measure recall, we can measure retrievability.

领英推荐

Reverse search: going from a potential result to candidate queries.

Not all candidate queries are equal.

We can’t give up on measuring recall just because it’s hard.

Daniel Tunkelang的更多文章

Quo Vadis Nunc, Quora?

Seriously or Literally?

Cold Start, Practical Edition

All Else Equal

Take Searchers Seriously, Not Literally

Hallucinating a Post-Search World

Handling Facets With Many Values

Facets, But Which Ones?

Search and Discovery

Where Do LTR Labels Come From?

社区洞察

其他会员也浏览了

Making Sense of Null and Low Results

Is Similarity Objective?

Tuning Information Retrieval in Agent Builder Search applications with Google Search?Adaptor.

How search engines work, the role of algorithms?

How to Get Ranked on You.com?

Throwing Needles Into Haystacks

Effective Query Triage

Search: The Whole Story

Video indexing issues found on your site [SOLVED]

Balance Your Search Budget!