Recall: Size Isn’t Everything

Recall is a key measure of search quality that doesn’t always get the attention it deserves. But how does a search engine recognize when recall is a problem?

Size Matters, But Size Isn’t Everything

Recall measures the fraction of relevant results that are retrieved. Naturally, recall is correlated to the size of the result set. But we have to be careful not to overstate that correlation.

Consider the simplest case: a search that returns no results. A lack of results does not necessarily indicate a recall problem: there may simply be no results that relate to the searcher’s information need. Still, the upside from trying harder to find at least one relevant result generally outweighs the downside, at least if the search engine provides clear messaging to searchers about the techniques, such as query relaxation, that it uses to obtain those results.

Things become more complicated when there are a few results — or even just a single result. On one hand, a low result count makes it more likely that retrieval failed to include additional relevant results. On the other hand, a small number of results may still represent robust recall. For example, when a searcher searches for a specific document or product by name, the search engine can achieve perfect recall by returning a single result.

At the other end of the spectrum, consider searches that return a large number of results. A large number of results makes it less likely that recall is a problem, but there’s no guarantee. The retrieval strategy may have missed a even larger number of relevant results. Or the retrieved results may be numerous but mostly irrelevant; while there may be a large number of unretrieved relevant results. Also, some results matter more than others, and the retrieved results may not include the most desirable ones.

In short, the size of the result set is a useful signal, but not a definitive one.

So, what other signals can we use to identify recall problems?

Query Understanding

Recall measures the search engine’s success at finding results that relate to the searcher’s intent. Good result quality in general — and good recall in particular — starts with understanding the searcher’s intent.

Query understanding transforms queries into representations of search intent. It consists of modules that extract intent signals from queries, rewriting queries and introducing structure. Compared to ranking, query understanding mostly focuses on binary, objective aspects of search intent.

Query understanding provides two kinds of signals to detect recall problems:

  • Confidence: if query understanding cannot confidently establish the searcher’s intent, then it’s unlikely that the search results will be of good quality — which in turn means that a large number of results is not a strong signal of recall. Examples: a failure to map the query to a category, or inability to recognize most of the query terms as entities.
  • Specificity: if query understanding does confidently establish the searcher’s intent, then it should also be able to estimate the specificity of that intent. For example, a known-item search targets a single result, while a category search generally targets a larger set. If query specificity targets a large result set, then a small result set indicates a recall problem.

Sensitivity Analysis

Since retrieval is always a tradeoff between precision and recall, another strategy for detecting recall problems is to adjust that tradeoff and observe how it changes the results:

  • Expansion: if a more aggressive query expansion strategy (e.g., using stemming or synonyms) significantly increases the number of results, there may be an opportunity to meaningfully improve recall. This signal is more robust if the new results are similar to those initially retrieved.
  • Relaxation: query relaxation, which makes one or more query terms optional, is another way to test the opportunity to improve recall. Query relaxation tends to be more aggressive than query expansion, so it’s even more important that the new results be similar to those initially retrieved.
  • Removing Constraints: query understanding can introduce constraints to improve precision, such as phrase-enforcing segments or scoping query terms to specific result fields. Removing these constraints adjusts the tradeoff, and is another way to test the opportunity to improve recall.

Summary

Recall is important. And, while recall is correlated to the size of the result set, more results do not guarantee improved recall. Size matters, but size isn’t everything. Using query understanding and sensitivity analysis, we can adjust the precision-recall tradeoff to find opportunities to improve recall.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了